*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb --------------------------------------------------............. 
[93m[NO][0m DeepSpeed C++/CUDA extension op report.......
 --------------------------------------------------[92m[OKAY][0m

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
ninja .................. [92m[OKAY][0m
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

quantizer quantizer..............  ..............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inference transformer_inference..  ..[93m[NO][0m  [93m[NO][0m....... .......  [92m[OKAY][0m[92m[OKAY][0m

utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inference .. transformer_inference[93m[NO][0m  .........  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
utils utils..................  ..................[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
op name ................ installed .. compatible
--------------------------------------------------
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m utils.......  ..................[92m[OKAY][0m 
[92m[YES][0m ...... [92m[OKAY][0m
utils .................. [92m[YES][0m ......quantizer  [92m[OKAY][0m..............
 [93m[NO][0m ....... quantizer[92m[OKAY][0m 
.............. [93m[NO][0m ....... [92m[OKAY][0m--------------------------------------------------

--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
op name ................ installed .. compatible
--------------------------------------------------
async_io ............... [93m[NO][0m ....... [93m[NO][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.transformer_inference .. 
[93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0masync_io
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
 ............... [93m[NO][0m quantizer.......  ..............[93m[NO][0m 
[93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... DeepSpeed general environment info:
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch install path ...............torch version  .................... 1.8.1
torch cuda version['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 
............... 11.1
torch versionnvcc version  .........................................  1.8.111.2

deepspeed install pathtorch cuda version  ..........................  11.1['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

nvcc versiondeepspeed info  ........................................ 11.2 
0.4.2+bc17042, bc17042, big-sciencedeepspeed install path
 deepspeed wheel compiled w............  ...... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']torch 1.8, cuda 11.1

deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1DeepSpeed general environment info:

torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0masync_io .......  ...............[93m[NO][0m 
[93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m .......utils  [92m[OKAY][0m..................
 [92m[YES][0m ...... [92m[OKAY][0m
utils ..................quantizer  [92m[YES][0m..............  ......[93m[NO][0m  [92m[OKAY][0m
....... [92m[OKAY][0m
quantizer ..............-------------------------------------------------- 
[93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.1
11.1
nvcc version nvcc version.....................  .....................11.2 
11.2deepspeed install path
 deepspeed install path...........  ...........['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info
 deepspeed info...................  ...................0.4.2+bc17042, bc17042, big-science 
0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w.
 deepspeed wheel compiled w.......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
DeepSpeed general environment info:
torch version .................... 1.8.1
torch cuda version ............... 11.1
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
nvcc version ..................... 11.2
torch version .................... 1.8.1
torch cuda version ............... 11.1
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... DeepSpeed general environment info:
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch versiontorch install path ....................  ...............1.8.1 
torch cuda version ............... 11.1['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

nvcc version .....................torch version  11.2....................
 deepspeed install path1.8.1 
........... torch cuda version ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']...............
 deepspeed info11.1 
...................nvcc version  0.4.2+bc17042, bc17042, big-science.....................
 deepspeed wheel compiled w.11.2 
......deepspeed install path  torch 1.8, cuda 11.1...........
 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
async_io ............... utils[93m[NO][0m  .........................  [92m[YES][0m[93m[NO][0m 
...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m .......transformer_inference  [92m[OKAY][0m..
 [93m[NO][0m .......-------------------------------------------------- 
[92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

quantizer quantizer..............  ..............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
/bin/sh: line 0: type: git: not found
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:DeepSpeed general environment info:

torch install path ...............torch install path  ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... torch version1.8.1 
.................... torch cuda version1.8.1 
............... 11.1torch cuda version
 nvcc version...............  .....................11.1
 11.2nvcc version
 deepspeed install path.....................  ...........11.2 
deepspeed install path['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
...........deepspeed info  ...................['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
0.4.2+bc17042, bc17042, big-sciencedeepspeed info
 deepspeed wheel compiled w....................  ......0.4.2+bc17042, bc17042, big-science 
torch 1.8, cuda 11.1
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
DeepSpeed general environment info:torch install path
 ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']....................
 1.8.1
torch version ....................torch cuda version  1.8.1...............
 11.1torch cuda version
 nvcc version...............  .....................11.1 
11.2nvcc version
 deepspeed install path.....................  ...........11.2 
deepspeed install path['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
........... deepspeed info ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']...................
 0.4.2+bc17042, bc17042, big-sciencedeepspeed info
 ...................deepspeed wheel compiled w.  0.4.2+bc17042, bc17042, big-science......
 torch 1.8, cuda 11.1deepspeed wheel compiled w.
 ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

DeepSpeed general environment info:
torch install path ............... DeepSpeed general environment info:
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch install path
 ...............torch version  .................... 1.8.1
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch cuda version
 ...............torch version  11.1....................
 nvcc version1.8.1 
..................... 11.2torch cuda version
 ...............deepspeed install path  ...........11.1 
nvcc version['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
..................... deepspeed info11.2 
...................deepspeed install path  0.4.2+bc17042, bc17042, big-science...........
 deepspeed wheel compiled w.['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
...... deepspeed infotorch 1.8, cuda 11.1 
................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path ...............  ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1DeepSpeed general environment info:

torch cuda versiontorch cuda version 
 ..............................  11.111.1

torch install pathnvcc versionnvcc version   .........................................................   11.211.2

deepspeed install pathdeepspeed install path  ...................... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infotorch version deepspeed info .................... ................... ................... 1.8.1 0.4.2+bc17042, bc17042, big-science
0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.torch cuda versiondeepspeed wheel compiled w.   ...........................   torch 1.8, cuda 11.111.1torch 1.8, cuda 11.1


nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
/bin/sh: line 0: type: git: not found
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attnninja ............  [93m[NO][0m..................  .......[92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------
op nametransformer  ............................  installed[93m[NO][0m  ......... compatible 
[92m[OKAY][0m
--------------------------------------------------
stochastic_transformer . [93m[NO][0m cpu_adam.......  ...............[92m[OKAY][0m 
[92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m --------------------------------------------------....... 
[92m[OKAY][0m
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m transformer_inference.......  ..[93m[NO][0m 
[93m[NO][0m ....... [92m[OKAY][0m
utils ..................transformer_inference  [92m[YES][0m..  ......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
quantizer .............. [93m[NO][0m .......utils  [92m[OKAY][0m..................
 [92m[YES][0m ...... --------------------------------------------------[92m[OKAY][0m

quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
DeepSpeed general environment info:torch version .................... 1.8.1

torch cuda version ...............torch install path  11.1...............
 nvcc version ..................... 11.2
deepspeed install path['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 
........... torch version['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
 ....................deepspeed info  1.8.1...................
 0.4.2+bc17042, bc17042, big-science
torch cuda version deepspeed wheel compiled w................  ......11.1 
torch 1.8, cuda 11.1nvcc version
 ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report


------------------------------------------------------------------------------------------------------------------------------------------------------


DeepSpeed C++/CUDA extension op report--------------------------------------------------
DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report


----------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------
DeepSpeed C++/CUDA extension op report

DeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report
--------------------------------------------------DeepSpeed C++/CUDA extension op report


----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------

--------------------------------------------------DeepSpeed C++/CUDA extension op report--------------------------------------------------
--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------


----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
JIT compiled ops requires ninjaJIT compiled ops requires ninja


JIT compiled ops requires ninjaJIT compiled ops requires ninja

      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

JIT compiled ops requires ninjaJIT compiled ops requires ninja

--------------------------------------------------
JIT compiled ops requires ninja

JIT compiled ops requires ninja
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


--------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja--------------------------------------------------


JIT compiled ops requires ninjaJIT compiled ops requires ninja

------------------------------------------------------------------------------------------------------------------------------------------------------


--------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report


--------------------------------------------------DeepSpeed C++/CUDA extension op report--------------------------------------------------

--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


--------------------------------------------------JIT compiled ops requires ninja----------------------------------------------------------------------------------------------------


JIT compiled ops requires ninjaJIT compiled ops requires ninjaJIT compiled ops requires ninja


----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

----------------------------------------------------------------------------------------------------


--------------------------------------------------JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja


DeepSpeed C++/CUDA extension op report--------------------------------------------------

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

--------------------------------------------------JIT compiled ops requires ninja

JIT compiled ops requires ninja
ninjaninjaninjaninja   ....................................  ..................[92m[OKAY][0m .................. 
ninjaninjaninja  ninja....................................   .................. [92m[OKAY][0m[92m[OKAY][0m ..................

[92m[OKAY][0m[92m[OKAY][0m --------------------------------------------------

 [92m[OKAY][0m[92m[OKAY][0m--------------------------------------------------

--------------------------------------------------

----------------------------------------------------------------------------------------------------op name

ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


[92m[OKAY][0m
----------------------------------------------------------------------------------------------------

op name
op name--------------------------------------------------  op name
 op nameop nameop name................    ................................................installed    installedinstalled..installed    ....compatible..  
 compatiblecompatiblecompatible--------------------------------------------------


------------------------------------------------------------------------------------------------------------------------------------------------------


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


................................   ................op nameinstalledinstalled    ..installed.. ................  compatible..compatible 

 installed--------------------------------------------------compatible--------------------------------------------------
 
cpu_adam ............... cpu_adamcpu_adam [92m[YES][0mcpu_adam ...............  ............... ..................... [92m[YES][0m [92m[YES][0m  [92m[YES][0m [92m[OKAY][0m...... ......
 ...... [92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m

op nameop name  op nameop name................................    ................installedinstalled................    installed..installed..    ....compatiblecompatible  

compatiblecompatible--------------------------------------------------
--------------------------------------------------


----------------------------------------------------------------------------------------------------


..-------------------------------------------------- 
compatible
--------------------------------------------------
fused_adam .............fused_adam fused_adam [93m[NO][0m ............. fused_adam............. .......  [93m[NO][0m [93m[NO][0m.............[92m[OKAY][0m   
..............[93m[NO][0m   [92m[OKAY][0m[92m[OKAY][0m.......fused_lamb

cpu_adamcpu_adam cpu_adamcpu_adam ............... ...............  ..............................[92m[YES][0m  [92m[YES][0m  [92m[YES][0m [92m[YES][0m......   ..................[92m[OKAY][0m  
cpu_adamcpu_adam  ...............cpu_adam...............   [92m[YES][0mcpu_adam............... [92m[YES][0m   [92m[YES][0m............ ............... ......   [92m[OKAY][0m[92m[OKAY][0m
[92m[YES][0m[92m[OKAY][0m
  [92m[OKAY][0m.............
 [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


...... [92m[OKAY][0m
 fused_lambfused_lamb[93m[NO][0m   ................................. fused_lamb  [93m[NO][0m[93m[NO][0m   [92m[OKAY][0m...........................
   [93m[NO][0m[92m[OKAY][0m[92m[OKAY][0m 

fused_adamfused_adam fused_adamfused_adam .............  ............. .......................... [93m[NO][0m [93m[NO][0m [93m[NO][0m  ....... [93m[NO][0m....... .......  [92m[OKAY][0m 
....... [92m[OKAY][0m
[92m[OKAY][0m[92m[OKAY][0m.......

 fused_lamb[92m[OKAY][0m fused_lamb
fused_adamfused_adam  fused_adam..........................  fused_adam[93m[NO][0m  [93m[NO][0m.............  .................... .......  [93m[NO][0m[92m[OKAY][0m  [93m[NO][0m
sparse_attn ............sparse_attn sparse_attn [93m[NO][0m  ........................sparse_attn.......    ............[93m[NO][0m[93m[NO][0m[92m[OKAY][0m   
[93m[NO][0m..............   .......[92m[OKAY][0m[92m[OKAY][0m 

transformer[92m[OKAY][0m 
.............fused_lamb   [93m[NO][0m..........................fused_lamb   ....... [93m[NO][0m[93m[NO][0m ............. .......  [93m[NO][0m[92m[OKAY][0m 
.......  [92m[OKAY][0m[92m[OKAY][0m.......

.......[92m[OKAY][0m  fused_lamb
[92m[OKAY][0m....... 
fused_lamb.............   [92m[OKAY][0m.............
............transformer  transformer............transformer[93m[NO][0m    ............[93m[NO][0m...................    [93m[NO][0m.......[93m[NO][0m[92m[OKAY][0m   
 [92m[OKAY][0m
fused_lamb[93m[NO][0m   .......[93m[NO][0m.............   [92m[OKAY][0m[93m[NO][0m
.......[92m[OKAY][0m....... 
 [92m[OKAY][0m[92m[OKAY][0m

sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0msparse_attn
fused_lamb.......   .................... [92m[OKAY][0m[92m[OKAY][0m
 
stochastic_transformer stochastic_transformer. stochastic_transformer stochastic_transformer [93m[NO][0m.   ........[93m[NO][0m .  [93m[NO][0m [92m[OKAY][0m[93m[NO][0m.......   .......
.......[92m[OKAY][0m  
[92m[OKAY][0m[92m[OKAY][0m

 sparse_attn............sparse_attn transformer   ............[93m[NO][0m........................    .......[93m[NO][0m[93m[NO][0m [93m[NO][0m   [92m[OKAY][0m.....................
[93m[NO][0m .......sparse_attn  [92m[OKAY][0m............
 [93m[NO][0m .......sparse_attn [92m[OKAY][0m 
   [92m[OKAY][0m[92m[OKAY][0mtransformer[92m[OKAY][0m

 
sparse_attn............ transformer ............ [93m[NO][0m............   [93m[NO][0msparse_attn .......[93m[NO][0m ....... ............  [92m[OKAY][0m .......
[92m[OKAY][0m[93m[NO][0m 
............ transformer[93m[NO][0m  stochastic_transformertransformer...................    [92m[OKAY][0m[93m[NO][0m.............
 [92m[OKAY][0mtransformer
....... ............transformer   [92m[OKAY][0m[93m[NO][0m............  stochastic_transformer
  ....... [93m[NO][0m [93m[NO][0m [92m[OKAY][0mstochastic_transformer 
....... .......  [92m[OKAY][0m[92m[OKAY][0m.

....... [93m[NO][0m [92m[OKAY][0m transformer
stochastic_transformer  [93m[NO][0m .stochastic_transformer.......   [93m[NO][0m[92m[OKAY][0m .
.......  [93m[NO][0m[92m[OKAY][0m
. ....... ............[93m[NO][0m  stochastic_transformer [92m[OKAY][0m[93m[NO][0m .......  
.[92m[OKAY][0m....... 
 ....... [92m[OKAY][0m
[93m[NO][0m stochastic_transformer [92m[OKAY][0m .......
 .[92m[OKAY][0m 
[93m[NO][0mstochastic_transformer  ....... [92m[OKAY][0m.
 [93m[NO][0m ....... [92m[OKAY][0m
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


op nameop name op name................op name    ................................installed................    ..installed installedinstalledcompatible   
.... ..-------------------------------------------------- compatible 

compatiblecompatible--------------------------------------------------


--------------------------------------------------
--------------------------------------------------
cpu_adam cpu_adam...............  cpu_adam...............cpu_adam[92m[YES][0m   [92m[YES][0m ............... ............... ...... [92m[YES][0m[92m[OKAY][0m......   
[92m[YES][0m......[92m[OKAY][0m 
 ...... [92m[OKAY][0m[92m[OKAY][0m

fused_adam ............. [93m[NO][0m fused_adam.......  .............[92m[OKAY][0mfused_adam 
fused_adam [93m[NO][0m ............. fused_lamb ....... ............. .............[93m[NO][0m [92m[OKAY][0m  [93m[NO][0m
[93m[NO][0m.......   fused_lamb..............[92m[OKAY][0m   .............[92m[OKAY][0m
[92m[OKAY][0m 
[93m[NO][0m
 fused_lamb.......  .............fused_lamb [92m[OKAY][0m [93m[NO][0m
 .................... sparse_attn [93m[NO][0m [92m[OKAY][0m 
...................  [93m[NO][0m[92m[OKAY][0m 
.......sparse_attn  [92m[OKAY][0m............
 [93m[NO][0m ....... sparse_attntransformer[92m[OKAY][0m
  ........................transformer sparse_attn  [93m[NO][0m  [93m[NO][0m...............................    [93m[NO][0m.......[92m[OKAY][0m [93m[NO][0m 
.......[92m[OKAY][0m  
[92m[OKAY][0m.......stochastic_transformer
transformer   [92m[OKAY][0m............
.stochastic_transformer   [93m[NO][0mtransformer.[93m[NO][0m    ..........................[93m[NO][0m    [92m[OKAY][0m[92m[OKAY][0m[93m[NO][0m
.......
  [92m[OKAY][0m.......
 stochastic_transformer[92m[OKAY][0m 
. stochastic_transformer[93m[NO][0m  ....... .[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


--------------------------------------------------DeepSpeed C++/CUDA extension op report----------------------------------------------------------------------------------------------------


JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report--------------------------------------------------
JIT compiled ops requires ninja

--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninjaJIT compiled ops requires ninja

ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


op nameop nameop name op name  ................ ................ ................................ installed installed  installed ..installed ..  .. ..compatible compatible compatible


compatible------------------------------------------------------------------------------------------------------------------------------------------------------


--------------------------------------------------
cpu_adamcpu_adam cpu_adam ...............cpu_adam   ..............................[92m[YES][0m ...............  [92m[YES][0m ...... [92m[YES][0m[92m[YES][0m ......   [92m[OKAY][0m......[92m[OKAY][0m

......  [92m[OKAY][0m[92m[OKAY][0m

fused_adam fused_adam.............  fused_adam[93m[NO][0mfused_adam.............  .......  .......................... [93m[NO][0m  [92m[OKAY][0m [93m[NO][0m[93m[NO][0m
.......   .......fused_lamb[92m[OKAY][0m ....... .............
 [92m[OKAY][0m [92m[OKAY][0m
[93m[NO][0m
fused_lamb fused_lamb....... fused_lamb  ............. .............[92m[OKAY][0m.............  
 [93m[NO][0m[93m[NO][0m[93m[NO][0m   .....................   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
sparse_attntransformer sparse_attn ............sparse_attn ............ ............  [93m[NO][0m [93m[NO][0m............  .......[93m[NO][0m .......[93m[NO][0m   ....... [92m[OKAY][0m[92m[OKAY][0m .......
[92m[OKAY][0m 

[92m[OKAY][0mstochastic_transformer
 transformer.transformertransformer   [93m[NO][0m............  ............ [93m[NO][0m............ .......  [93m[NO][0m [93m[NO][0m....... [92m[OKAY][0m  .......
....... [92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m

stochastic_transformerstochastic_transformerstochastic_transformer   ...   [93m[NO][0m[93m[NO][0m[93m[NO][0m   .....................   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


----------------------------------------------------------------------------------------------------

--------------------------------------------------DeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
DeepSpeed C++/CUDA extension op report

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------
----------------------------------------------------------------------------------------------------
--------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja


------------------------------------------------------------------------------------------------------------------------------------------------------


JIT compiled ops requires ninjaJIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja   .................. .................................... ..................[92m[OKAY][0m  
 [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m
--------------------------------------------------


--------------------------------------------------
--------------------------------------------------op name--------------------------------------------------op name
 
 ................op name................op name    ................installed................installed    ..installedinstalled..    compatiblecompatible..
..
 -------------------------------------------------- --------------------------------------------------compatible
compatible


----------------------------------------------------------------------------------------------------

cpu_adam cpu_adam...............  [92m[YES][0m............... cpu_adamcpu_adam  [92m[YES][0m.....................   [92m[OKAY][0m ..................... [92m[OKAY][0m
 
[92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

fused_adam fused_adam.............  .............[93m[NO][0m  [93m[NO][0m.......  fused_adamfused_adam....... [92m[OKAY][0m  .............
.............[92m[OKAY][0m 
[93m[NO][0m fused_lamb [93m[NO][0m fused_lamb....................    ....................[93m[NO][0m[92m[OKAY][0m .......  
 [93m[NO][0m[92m[OKAY][0m[92m[OKAY][0m fused_lamb

....... fused_lamb .............[92m[OKAY][0m  
.............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0msparse_attn
 ............ [93m[NO][0m ....... sparse_attn[92m[OKAY][0m 
............ [93m[NO][0mtransformer  ................... sparse_attnsparse_attn[92m[OKAY][0m
  [93m[NO][0m ............transformer ............  .......[93m[NO][0m  ............ [92m[OKAY][0m[93m[NO][0m .......
 [93m[NO][0m [92m[OKAY][0m .......stochastic_transformer
 ....... [92m[OKAY][0m transformer.[92m[OKAY][0m
  
............[93m[NO][0mtransformer  stochastic_transformer[93m[NO][0m  ....... ............ ........  [92m[OKAY][0m [93m[NO][0m[92m[OKAY][0m
[93m[NO][0m
  ..............  stochastic_transformer[92m[OKAY][0m[92m[OKAY][0m
 
.stochastic_transformer  [93m[NO][0m ........  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------DeepSpeed C++/CUDA extension op report

JIT compiled ops requires ninja--------------------------------------------------
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

DeepSpeed C++/CUDA extension op report--------------------------------------------------

--------------------------------------------------
JIT compiled ops requires ninja--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


--------------------------------------------------DeepSpeed C++/CUDA extension op report

JIT compiled ops requires ninja
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninja  .................. .................. ninja ..................[92m[OKAY][0m[92m[OKAY][0m 
 
[92m[OKAY][0m..................---------------------------------------------------------------------------------------------------- 

[92m[OKAY][0m
--------------------------------------------------
op nameop name
 -------------------------------------------------- op name................
................   op nameinstalled................installed    ................installed.. .. installed   compatiblecompatible..
..
  --------------------------------------------------compatible--------------------------------------------------compatible


----------------------------------------------------------------------------------------------------

cpu_adamcpu_adam  ..............................cpu_adamcpu_adam    [92m[YES][0m...............[92m[YES][0m   ......[92m[YES][0m............... ...... [92m[OKAY][0m ...... 
[92m[YES][0m [92m[OKAY][0m [92m[OKAY][0m
......
 [92m[OKAY][0m
fused_adam ............. fused_adamfused_adam[93m[NO][0m  fused_adam....... ............. .............  [92m[OKAY][0m .............[93m[NO][0m[93m[NO][0m
   .......[93m[NO][0m.......fused_lamb    [92m[OKAY][0m....................
 [92m[OKAY][0m [93m[NO][0m 
[92m[OKAY][0m.......fused_lamb
  [92m[OKAY][0m.............fused_lamb
fused_lamb   [93m[NO][0m..........................   .......[93m[NO][0m[93m[NO][0m   [92m[OKAY][0m..............
sparse_attn   [92m[OKAY][0m[92m[OKAY][0m............

 [93m[NO][0m ....... [92m[OKAY][0m
sparse_attntransformer  ........................  [93m[NO][0msparse_attn[93m[NO][0m sparse_attn   ................... ................... [92m[OKAY][0m  [93m[NO][0m[93m[NO][0m
[92m[OKAY][0m 
 .............. stochastic_transformer [92m[OKAY][0m transformer[92m[OKAY][0m
 ............. 
 [93m[NO][0mtransformer[93m[NO][0m  ....... transformer............ .......   [92m[OKAY][0m[92m[OKAY][0m............
[93m[NO][0m 
 [93m[NO][0m.......  stochastic_transformer.......[92m[OKAY][0m  
[92m[OKAY][0m.
 stochastic_transformer[93m[NO][0m  stochastic_transformer........   [92m[OKAY][0m[93m[NO][0m
.  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
ninjaninjaninja  ninja .................................... ..................   ..................[92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------


op nameop nameop name op name ................   ................................installed   ................installed..installed   installed ..compatible  
compatible....
--------------------------------------------------  
--------------------------------------------------compatiblecompatible


----------------------------------------------------------------------------------------------------

cpu_adam ............... cpu_adam[92m[YES][0m  ...............cpu_adam......cpu_adam    ...............[92m[YES][0m[92m[OKAY][0m............... 
 ...... [92m[YES][0m [92m[YES][0m[92m[OKAY][0m 
 ............  [92m[OKAY][0m[92m[OKAY][0m

fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0mfused_adam
 fused_adam.............fused_adam fused_lamb  ............. [93m[NO][0m ............. .............[93m[NO][0m .......  [93m[NO][0m [93m[NO][0m  .......[92m[OKAY][0m..............  
 [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


fused_lamb fused_lamb.............fused_lamb  ............. [93m[NO][0m.............   [93m[NO][0m.......[93m[NO][0m   sparse_attn[92m[OKAY][0m....... 
....... ............ [92m[OKAY][0m [92m[OKAY][0m
[93m[NO][0m
 ....... [92m[OKAY][0m
transformer ............ sparse_attn[93m[NO][0m  ...................  sparse_attn[92m[OKAY][0m
[93m[NO][0msparse_attn   ...............................  stochastic_transformer[92m[OKAY][0m  
[93m[NO][0m[93m[NO][0m  ...............transformer    [93m[NO][0m[92m[OKAY][0m............ [92m[OKAY][0m.......
 
 [93m[NO][0m[92m[OKAY][0mtransformer transformer 
...................   ............[92m[OKAY][0m[93m[NO][0m 
 [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0mstochastic_transformer
 .stochastic_transformerstochastic_transformer   [93m[NO][0m .........  [92m[OKAY][0m [93m[NO][0m
[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report
--------------------------------------------------
----------------------------------------------------------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------

--------------------------------------------------
----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report


JIT compiled ops requires ninjaJIT compiled ops requires ninja--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
--------------------------------------------------JIT compiled ops requires ninja

JIT compiled ops requires ninja
ninjaninjaninjaninja   ..................  ..................[92m[OKAY][0m.................. 
.................. [92m[OKAY][0m-------------------------------------------------- [92m[OKAY][0m

[92m[OKAY][0m
--------------------------------------------------op name

-------------------------------------------------- --------------------------------------------------
op name................ 
 op name................installed op name  ................ ..installed  ................installed compatible  
..installed.. --------------------------------------------------  
compatible..compatible
 
compatible----------------------------------------------------------------------------------------------------


--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... cpu_adam[92m[OKAY][0mcpu_adam  
cpu_adam..............................   ...............[92m[YES][0m[92m[YES][0m   ......[92m[YES][0m......   [92m[OKAY][0m......[92m[OKAY][0m
fused_adam
  [92m[OKAY][0m.............
 [93m[NO][0m ....... [92m[OKAY][0m
fused_adamfused_adamfused_lamb   .......................................fused_adam    [93m[NO][0m[93m[NO][0m.............[93m[NO][0m    .....................  [93m[NO][0m[92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m 

.......
 fused_lamb[92m[OKAY][0m fused_lamb
.............  .............[93m[NO][0m fused_lamb [93m[NO][0m ....... sparse_attn .................... [92m[OKAY][0m ............[93m[NO][0m
   [92m[OKAY][0m[93m[NO][0m.......
  .......[92m[OKAY][0m 
[92m[OKAY][0m
transformer ............sparse_attn  sparse_attn[93m[NO][0m............   .......sparse_attn............[93m[NO][0m   [92m[OKAY][0m .......[93m[NO][0m............
   [92m[OKAY][0m[93m[NO][0m.......
  stochastic_transformer.......[92m[OKAY][0m 
 transformer[92m[OKAY][0m transformer.............
   ............[93m[NO][0m[93m[NO][0mtransformer    [93m[NO][0m..........................    [92m[OKAY][0m[92m[OKAY][0m.......
[93m[NO][0m
  [92m[OKAY][0m.......
 stochastic_transformer[92m[OKAY][0m 
.stochastic_transformer  [93m[NO][0mstochastic_transformer  ........  .[93m[NO][0m[92m[OKAY][0m  
[93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. async_io[92m[YES][0m  .....................  [93m[NO][0m[92m[OKAY][0m 
....... [93m[NO][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------transformer_inference
 .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m async_io......  ...............[92m[OKAY][0m 
[93m[NO][0m ....... [93m[NO][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------transformer_inference
 .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... [93m[NO][0masync_io .......  ...............[93m[NO][0m 
[93m[NO][0m ....... [93m[NO][0m
transformer_inference .. transformer_inference[93m[NO][0m  .........  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
utils .................. utils[92m[YES][0m  ........................  [92m[YES][0m[92m[OKAY][0m 
...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m quantizer.......  ..............[92m[OKAY][0m 
[93m[NO][0m ....... --------------------------------------------------[92m[OKAY][0m

--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference ..utils  [93m[NO][0m..................  .......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
utils quantizer..................  ..............[92m[YES][0m  [93m[NO][0m......  .......[92m[OKAY][0m 
[92m[OKAY][0m
quantizer .............. --------------------------------------------------[93m[NO][0m
 ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ...............async_io [93m[NO][0m  ......................  [93m[NO][0m[93m[NO][0m 
....... [93m[NO][0m
transformer_inference .. transformer_inference[93m[NO][0m  .........  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
utils ..................utils  [92m[YES][0m..................  ......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
quantizer ..............quantizer  ..............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inference transformer_inference..  ..[93m[NO][0m  [93m[NO][0m ..............  [92m[OKAY][0m[92m[OKAY][0m

utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ...............async_io [93m[NO][0m  ......................  [93m[NO][0m[93m[NO][0m 
....... [93m[NO][0m
transformer_inference transformer_inference..  ..[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m
 [92m[OKAY][0m
utils ..................utils  [92m[YES][0m..................  ...... [92m[YES][0m[92m[OKAY][0m 
...... [92m[OKAY][0m
quantizer quantizer..............  ..............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io async_io...............  ...............[93m[NO][0m  [93m[NO][0m.......  .......[93m[NO][0m 
[93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference utils..  ..................[93m[NO][0m  [92m[YES][0m.......  ......[92m[OKAY][0m 
[92m[OKAY][0m
quantizer ..............utils  [93m[NO][0m..................  .......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
--------------------------------------------------
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op report

--------------------------------------------------DeepSpeed C++/CUDA extension op report

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------
--------------------------------------------------
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


JIT compiled ops requires ninja--------------------------------------------------DeepSpeed C++/CUDA extension op report

--------------------------------------------------
JIT compiled ops requires ninja

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja--------------------------------------------------

JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m async_io.......  ...............[93m[NO][0m 
[93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... transformer_inference[92m[OKAY][0m 
.. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... utils[92m[OKAY][0m 
.................. [92m[YES][0m ...... quantizer[92m[OKAY][0m 
.............. [93m[NO][0m .......quantizer  [92m[OKAY][0m..............
 [93m[NO][0m ....... [92m[OKAY][0m--------------------------------------------------

--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io async_io...............  [93m[NO][0m...............  .......[93m[NO][0m  [93m[NO][0m.......
 [93m[NO][0m
transformer_inference ..transformer_inference  [93m[NO][0m..  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
utils .................. utils[92m[YES][0m  .................. ......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
quantizer .............. quantizer[93m[NO][0m  .....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
ninjaninjaninja   ....................................ninja ..................[92m[OKAY][0m  
 [92m[OKAY][0m[92m[OKAY][0m--------------------------------------------------..................


 [92m[OKAY][0m--------------------------------------------------op name
--------------------------------------------------
 --------------------------------------------------op name
................
  op nameinstalled................op name    .................. installed................  installedcompatible .. installed 
.. --------------------------------------------------compatible ..

compatible --------------------------------------------------compatible


--------------------------------------------------
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... cpu_adamcpu_adam[92m[OKAY][0mcpu_adam
   .............................................   [92m[YES][0m[92m[YES][0m[92m[YES][0m  fused_adam......  ...... ................... [92m[OKAY][0m  
[92m[OKAY][0m[93m[NO][0m[92m[OKAY][0m 

....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

fused_lambfused_adam  ............. [93m[NO][0m............. fused_adam fused_adam....... [93m[NO][0m .............  [92m[OKAY][0m.................... 
async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

  [93m[NO][0m[93m[NO][0m[92m[OKAY][0m  
..............  [92m[OKAY][0m[92m[OKAY][0mfused_lamb

transformer_inference .. transformer_inference[93m[NO][0m  .........  [93m[NO][0m[92m[OKAY][0m 
 .............sparse_attn  [93m[NO][0mfused_lamb fused_lamb...................   ............. .............[93m[NO][0m  [92m[OKAY][0m [93m[NO][0m.......
....... [92m[OKAY][0m
[93m[NO][0m  ....... [92m[OKAY][0m 
.......[92m[OKAY][0m 
utils .................. utils[92m[YES][0m  ........................  [92m[YES][0m[92m[OKAY][0m 
...... [92m[OKAY][0m
transformer[92m[OKAY][0m 
quantizer quantizer..............  ..............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
............ [93m[NO][0m sparse_attn.......  [92m[OKAY][0m............
--------------------------------------------------
--------------------------------------------------
 [93m[NO][0msparse_attn ....... stochastic_transformersparse_attn ............  ............[92m[OKAY][0m . 
 [93m[NO][0m[93m[NO][0m[93m[NO][0mtransformer   ....... ................... .......[93m[NO][0m   ....... [92m[OKAY][0m[92m[OKAY][0m [92m[OKAY][0m


[92m[OKAY][0m
transformertransformer  ........................ stochastic_transformer [93m[NO][0m [93m[NO][0m  ...............   [92m[OKAY][0m[93m[NO][0m[92m[OKAY][0m 

....... [92m[OKAY][0m
stochastic_transformerstochastic_transformer  ..  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m [93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`........ [93m[NO][0m

transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0masync_io
 ............... [93m[NO][0m ....... utils[93m[NO][0m 
.................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer transformer_inference..............  ..[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inference .. [93m[NO][0m transformer_inference.......  ..[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m
utils .................. utils[92m[YES][0m  ........................  [92m[YES][0m[92m[OKAY][0m 
...... [92m[OKAY][0m
quantizerquantizer  ............................  [93m[NO][0m [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
----------------------------------------------------------------------------------------------------

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
DeepSpeed general environment info:
torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch version 
.................... torch version1.8.1 
.................... torch cuda version1.8.1 
............... torch cuda version11.1 
...............nvcc version  11.1.....................
 nvcc version11.2 
.....................deepspeed install path  11.2...........
 deepspeed install path['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
........... deepspeed info ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']...................
 0.4.2+bc17042, bc17042, big-sciencedeepspeed info
 ...................deepspeed wheel compiled w.  0.4.2+bc17042, bc17042, big-science......
 torch 1.8, cuda 11.1deepspeed wheel compiled w.
 ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version ....................torch version  1.8.1....................
 1.8.1
torch cuda version ...............torch cuda version  11.1...............
 nvcc version11.1 
..................... nvcc version11.2 
.....................deepspeed install path  11.2...........
 deepspeed install path ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']...........
 deepspeed info ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']...................
 deepspeed info0.4.2+bc17042, bc17042, big-science 
...................deepspeed wheel compiled w.  0.4.2+bc17042, bc17042, big-science......
 deepspeed wheel compiled w.torch 1.8, cuda 11.1 
...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.----------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------


DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report


------------------------------------------------------------------------------------------------------------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


----------------------------------------------------------------------------------------------------
--------------------------------------------------
JIT compiled ops requires ninja
JIT compiled ops requires ninja
JIT compiled ops requires ninja

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:DeepSpeed general environment info:

torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch version
 .................... torch version1.8.1 
.................... torch cuda version1.8.1 
............... torch cuda version11.1 
...............nvcc version  11.1.....................
 nvcc version11.2 
.....................deepspeed install path  11.2...........
 deepspeed install path ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']...........
 deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
................... deepspeed info0.4.2+bc17042, bc17042, big-science 
................... deepspeed wheel compiled w.0.4.2+bc17042, bc17042, big-science 
...... deepspeed wheel compiled w.torch 1.8, cuda 11.1 
...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
ninjaninjaninja  ninja....................................    [92m[OKAY][0m....................................[92m[OKAY][0m
 
 --------------------------------------------------[92m[OKAY][0m--------------------------------------------------
[92m[OKAY][0m

op name
--------------------------------------------------op name --------------------------------------------------
................ 
 ................op nameinstalled op name  ..installed  ................ ................compatible .. 
/bin/sh: line 0: type: git: not found
installed installed-------------------------------------------------- .. compatiblecompatible
 

..---------------------------------------------------------------------------------------------------- 

compatible
--------------------------------------------------
cpu_adam ............... cpu_adam[92m[YES][0m  cpu_adam......cpu_adam ...............   [92m[OKAY][0m...............[92m[YES][0m...............
   [92m[YES][0m[92m[YES][0m......  ...... ...... [92m[OKAY][0m [92m[OKAY][0m
fused_adam[92m[OKAY][0m 

............. [93m[NO][0m ....... [92m[OKAY][0m
fused_adamfused_lamb  fused_adam..........................fused_adam    .............[93m[NO][0m.............[93m[NO][0m   ....... [93m[NO][0m.......[93m[NO][0m    [92m[OKAY][0m..............
[92m[OKAY][0m [92m[OKAY][0m
 
[92m[OKAY][0mfused_lamb
/bin/sh: line 0: type: git: not found
 fused_lamb.............  fused_lamb.............[93m[NO][0m   sparse_attn.............[93m[NO][0m.......    ............[93m[NO][0m.......[92m[OKAY][0m  
[93m[NO][0m .......[92m[OKAY][0m  .......
 [92m[OKAY][0m[92m[OKAY][0m

transformer ............ sparse_attn[93m[NO][0m  .......sparse_attn............   [92m[OKAY][0m[93m[NO][0msparse_attn............
  [93m[NO][0m .......  ............stochastic_transformer.......[92m[OKAY][0m   [93m[NO][0m
[92m[OKAY][0m .
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
 transformer.......[93m[NO][0mtransformer    [92m[OKAY][0m...............................
   [92m[OKAY][0m[93m[NO][0m[93m[NO][0m
transformer   ..........................   [92m[OKAY][0m[93m[NO][0m[92m[OKAY][0m
 
.......stochastic_transformer stochastic_transformer [92m[OKAY][0m 
. .[93m[NO][0m  stochastic_transformer[93m[NO][0m.......  ....... [92m[OKAY][0m 
[92m[OKAY][0m.
 [93m[NO][0m ....... [92m[OKAY][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m async_io.......  [93m[NO][0m...............
 [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m transformer_inference.......  ..[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m utils......  ..................[92m[OKAY][0m 
[92m[YES][0m ...... [92m[OKAY][0mquantizer
 .............. [93m[NO][0m quantizer.......  ..............[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m--------------------------------------------------

--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
DeepSpeed general environment info:torch install path 
............... torch install path ...............['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 
torch version .................... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']1.8.1

torch cuda versiontorch version  ...................................  11.11.8.1

nvcc version .....................torch cuda version  11.2...............
 deepspeed install path11.1 
...........nvcc version  .....................['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
11.2
deepspeed infodeepspeed install path  ..............................  0.4.2+bc17042, bc17042, big-science
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed wheel compiled w.
 ......deepspeed info  torch 1.8, cuda 11.1...................
 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ...............DeepSpeed general environment info: 
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch install path
 ............... torch version .................... 1.8.1
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch cuda version ............... torch version11.1 
....................nvcc version  1.8.1.....................
 11.2
torch cuda versiondeepspeed install path  ..........................  11.1
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']nvcc version
 .....................deepspeed info  11.2...................
 deepspeed install path0.4.2+bc17042, bc17042, big-science 
........... deepspeed wheel compiled w. ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']...... 
torch 1.8, cuda 11.1deepspeed info
 ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install path deepspeed install path...........  ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info
 deepspeed info...................  ...................0.4.2+bc17042, bc17042, big-science 
0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
----------------------------------------------------------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------DeepSpeed C++/CUDA extension op report


------------------------------------------------------------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report


JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report--------------------------------------------------


--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------JIT compiled ops requires ninja


--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja

--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
async_io-------------------------------------------------- 
............... [93m[NO][0m ....... [93m[NO][0m
ninjaninjaninjaninja    ........................................................................   [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m


----------------------------------------------------------------------------------------------------

transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------op name--------------------------------------------------
op name 
................ op name ................op name  installed installed ................ ..................  ..installed installed   compatible....compatible
  
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
----------------------------------------------------------------------------------------------------compatiblecompatible


----------------------------------------------------------------------------------------------------

quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
cpu_adamcpu_adam  ..............................cpu_adamcpu_adam    [92m[YES][0m[92m[YES][0m..............................    ............[92m[YES][0m[92m[YES][0m    [92m[OKAY][0m[92m[OKAY][0m............

--------------------------------------------------
  [92m[OKAY][0m[92m[OKAY][0m

fused_adamfused_adam  ..........................  fused_adam[93m[NO][0mfused_adam [93m[NO][0m   ........................................    [93m[NO][0m[92m[OKAY][0m[92m[OKAY][0m 
[93m[NO][0m
.......  .......fused_lambfused_lamb[92m[OKAY][0m   
[92m[OKAY][0m..........................fused_lamb
   [93m[NO][0m[93m[NO][0mfused_lamb.............    ........................... [93m[NO][0m  [92m[OKAY][0m 
[92m[OKAY][0m[93m[NO][0m.......
  .......[92m[OKAY][0m 
[92m[OKAY][0m
sparse_attnsparse_attn  ............sparse_attn............sparse_attn   [93m[NO][0m[93m[NO][0m  ............ ..........................  [93m[NO][0m  [93m[NO][0m[92m[OKAY][0m [92m[OKAY][0m
 .......
....... transformer [92m[OKAY][0m [92m[OKAY][0mtransformer
............ 
 ............transformer[93m[NO][0m   [93m[NO][0mtransformer...................   ....... [93m[NO][0m[92m[OKAY][0m ............
[92m[OKAY][0m  
.......[93m[NO][0m stochastic_transformer [92m[OKAY][0mstochastic_transformer .......
  .. [92m[OKAY][0m stochastic_transformer[93m[NO][0m
[93m[NO][0m   .............. stochastic_transformer [92m[OKAY][0m. [92m[OKAY][0m
 
.[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------
--------------------------------------------------

--------------------------------------------------JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report


DeepSpeed C++/CUDA extension op report
----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninjaJIT compiled ops requires ninja

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io async_io...............  ...............[93m[NO][0m  [93m[NO][0m.......  .......[93m[NO][0m 
[93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0mtransformer_inference
 .. [93m[NO][0m ....... [92m[OKAY][0mutils
 .................. [92m[YES][0m ...... [92m[OKAY][0m
utils ..................quantizer [92m[YES][0m  .................... [93m[NO][0m .......  [92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------

op nameop nameop name  op name ................................ ................ ................ installed   installedinstalled.. installed ..  compatible ..
..compatible --------------------------------------------------
 
compatible--------------------------------------------------compatible


----------------------------------------------------------------------------------------------------

cpu_adam ............... [92m[YES][0mcpu_adam  ......cpu_adam cpu_adam...............[92m[OKAY][0m  
 ...............[92m[YES][0m...............   [92m[YES][0m......[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

 [92m[OKAY][0mfused_adam
 ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_adamfused_adam  ..........................fused_lamb   [93m[NO][0m[93m[NO][0mfused_adam.............    .......[93m[NO][0m .................... [92m[OKAY][0m  .......[93m[NO][0m
[92m[OKAY][0m [92m[OKAY][0m
 
fused_lamb.......  [92m[OKAY][0m.............
fused_lamb  [93m[NO][0m............. fused_lamb ....... [93m[NO][0m ............. [92m[OKAY][0m .......[93m[NO][0m
sparse_attn  [92m[OKAY][0m............ 
 .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
sparse_attn transformer............ ............ sparse_attn [93m[NO][0m [93m[NO][0m ............ ....... .......  [93m[NO][0m[92m[OKAY][0m[92m[OKAY][0msparse_attn

  ....... transformerstochastic_transformer............  [92m[OKAY][0m............ [93m[NO][0m
  .[93m[NO][0m.......transformer   [93m[NO][0m [92m[OKAY][0m....... ............
.......   [93m[NO][0m[92m[OKAY][0m[92m[OKAY][0m

 transformer....... stochastic_transformer ............ [92m[OKAY][0m 
[93m[NO][0m. [93m[NO][0m .......  .......stochastic_transformer[92m[OKAY][0m  
[92m[OKAY][0m
. [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path ...............  ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch version torch version....................  ....................1.8.1 
1.8.1
torch cuda version torch cuda version...............  ...............11.1 
11.1nvcc version
 nvcc version.....................  .....................11.2 
11.2deepspeed install path
 deepspeed install path...........  ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info 
...................deepspeed info  0.4.2+bc17042, bc17042, big-science...................
 deepspeed wheel compiled w.0.4.2+bc17042, bc17042, big-science 
...... deepspeed wheel compiled w.torch 1.8, cuda 11.1 
...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

--------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------DeepSpeed C++/CUDA extension op report


--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------


JIT compiled ops requires ninja--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


JIT compiled ops requires ninja--------------------------------------------------

JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja    ...................................................... ..................[92m[OKAY][0m   
[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------

------------------------------------------------------------------------------------------------------------------------------------------------------

op name
op name op name op name................ ................  ................ ................installed  installedinstalled installed  .. ....  .. compatiblecompatible compatible

compatible
----------------------------------------------------------------------------------------------------


--------------------------------------------------
--------------------------------------------------
cpu_adam cpu_adam............... cpu_adam ............... [92m[YES][0m ............... [92m[YES][0m ...... [92m[YES][0m ...... [92m[OKAY][0m ......
cpu_adam [92m[OKAY][0m [92m[OKAY][0m
...............
 [92m[YES][0mfused_adam  ................... fused_adam fused_adam[92m[OKAY][0m[93m[NO][0m 
  .......................... ....... [93m[NO][0m [93m[NO][0m [92m[OKAY][0m .......
.......  [92m[OKAY][0m[92m[OKAY][0m
fused_lamb
 .............fused_lamb  [93m[NO][0mfused_lamb.............   .......[93m[NO][0m ............. [92m[OKAY][0m .......
[93m[NO][0mfused_adam   [92m[OKAY][0m....................
  [92m[OKAY][0m[93m[NO][0m
 ....... sparse_attn[92m[OKAY][0m 
............sparse_attn  sparse_attn[93m[NO][0m............   ...................[93m[NO][0m  [93m[NO][0m fused_lamb[92m[OKAY][0m ....... ....... 
 [92m[OKAY][0m[92m[OKAY][0m

.............transformer transformertransformer [93m[NO][0m   ....................................   [93m[NO][0m[93m[NO][0m[93m[NO][0m   ............................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


stochastic_transformer stochastic_transformerstochastic_transformer.   [93m[NO][0m..   .......[93m[NO][0m[93m[NO][0m   [92m[OKAY][0m..............
  [92m[OKAY][0m[92m[OKAY][0m

sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:DeepSpeed general environment info:

transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version ....................
 1.8.1
torch version ....................torch cuda version  1.8.1...............
 11.1
torch cuda versionnvcc version  ....................................  11.111.2

nvcc versiondeepspeed install path  ................................  11.2
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed install path
 deepspeed info ................... 0.4.2+bc17042, bc17042, big-science...........
deepspeed wheel compiled w.  ...... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']torch 1.8, cuda 11.1

deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
DeepSpeed general environment info:torch version
 .................... 1.8.1
torch install pathtorch cuda version  ..............................  11.1
nvcc version ..................... 11.2
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']deepspeed install path
 ........... torch version ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']....................
 deepspeed info1.8.1 
................... 0.4.2+bc17042, bc17042, big-sciencetorch cuda version
 deepspeed wheel compiled w................  ......11.1 
torch 1.8, cuda 11.1nvcc version
 ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m utils.......  ..................[92m[OKAY][0m 
[92m[YES][0m ...... [92m[OKAY][0m
utils ..................quantizer  [92m[YES][0m..............  ......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
quantizer-------------------------------------------------- 
.............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------


DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report


----------------------------------------------------------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------

--------------------------------------------------
----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
JIT compiled ops requires ninjaJIT compiled ops requires ninja

--------------------------------------------------
JIT compiled ops requires ninja

JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inference transformer_inference..  ..[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... [93m[NO][0m ....... async_io[93m[NO][0m
 ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference ..utils  [93m[NO][0m..................  .......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
utilsquantizer  ................................  [93m[NO][0m[92m[YES][0m  .............  [92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ...............async_io [93m[NO][0m  ......................  [93m[NO][0m[93m[NO][0m 
....... [93m[NO][0m
transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utils utils..................  ..................[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
quantizer quantizer..............  ..............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
----------------------------------------------------------------------------------------------------

ninjaninjaninjaninja  ..................  .................. .................. [92m[OKAY][0m.................. [92m[OKAY][0m
 [92m[OKAY][0m--------------------------------------------------
[92m[OKAY][0m


--------------------------------------------------op name--------------------------------------------------
 --------------------------------------------------
op name................
 op name ................op name installed  ................ installed  ................installed....    installed..compatiblecompatible 
 
compatible..--------------------------------------------------
--------------------------------------------------
-------------------------------------------------- 

compatible
--------------------------------------------------
cpu_adamcpu_adamcpu_adam  cpu_adam..............................   ............... ...............[92m[YES][0m[92m[YES][0m    [92m[YES][0m......[92m[YES][0m......    ......[92m[OKAY][0m......[92m[OKAY][0m  
[92m[OKAY][0m
[92m[OKAY][0m

fused_adam fused_adamfused_adamfused_adam.............    [93m[NO][0m.......................... .............  .......[93m[NO][0m [93m[NO][0m   [92m[OKAY][0m[93m[NO][0m..............  
....... [92m[OKAY][0m fused_lamb[92m[OKAY][0m

[92m[OKAY][0m 
.............fused_lamb [93m[NO][0mfused_lamb  fused_lamb ............. ....... .......................... [93m[NO][0m   [93m[NO][0m.......[92m[OKAY][0m[93m[NO][0m  .......
 [92m[OKAY][0m .......
[92m[OKAY][0m 
[92m[OKAY][0m
sparse_attnsparse_attn sparse_attn ........................   sparse_attn[93m[NO][0m............ [93m[NO][0m   ............[93m[NO][0m ..............   .......[93m[NO][0m[92m[OKAY][0m[92m[OKAY][0m 

 [92m[OKAY][0m.......
transformer  [92m[OKAY][0m............transformer 
[93m[NO][0m transformer .......  transformer........................ [92m[OKAY][0m 
............ [93m[NO][0m [93m[NO][0m stochastic_transformer[93m[NO][0m ....... .......   .[92m[OKAY][0m[92m[OKAY][0m....... 

 [93m[NO][0m[92m[OKAY][0mstochastic_transformer 
....... stochastic_transformer .[92m[OKAY][0m  
stochastic_transformer[93m[NO][0m.   .......[93m[NO][0m . [92m[OKAY][0m 
.......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja--------------------------------------------------

--------------------------------------------------DeepSpeed C++/CUDA extension op report

--------------------------------------------------
DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja

--------------------------------------------------
JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
ninjaninjaninjaninja   .................. ....................................   ..................[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m 

[92m[OKAY][0m
--------------------------------------------------
----------------------------------------------------------------------------------------------------


op name--------------------------------------------------op name 
op name ................op name ................  ................installed................   installed..installed   compatible 
..installed.. -------------------------------------------------- compatible
.. 
compatible --------------------------------------------------compatible


----------------------------------------------------------------------------------------------------

cpu_adam ............... [92m[YES][0m cpu_adam......  ...............[92m[OKAY][0m cpu_adam
[92m[YES][0mcpu_adam   ....................................   [92m[YES][0m[92m[YES][0m[92m[OKAY][0m fused_adam
......  ...... ............. [92m[OKAY][0m [92m[OKAY][0m
[93m[NO][0m
 ....... fused_adam[92m[OKAY][0m 
............. [93m[NO][0m fused_lamb.......  fused_adam[92m[OKAY][0mfused_adam.............
   ..........................[93m[NO][0mfused_lamb    [93m[NO][0m.......[93m[NO][0m.............    .......[93m[NO][0m[92m[OKAY][0m....... 
  [92m[OKAY][0m[92m[OKAY][0m
.......
 [92m[OKAY][0m
fused_lambfused_lamb  ..........................  sparse_attn[93m[NO][0m[93m[NO][0m   ..........................   [93m[NO][0m[92m[OKAY][0m[92m[OKAY][0m sparse_attn

.......  ............[92m[OKAY][0m 
[93m[NO][0m ....... transformer[92m[OKAY][0m 
............ [93m[NO][0mtransformersparse_attn   ...................sparse_attn   ........................[92m[OKAY][0m [93m[NO][0m 
 [93m[NO][0m[93m[NO][0m.......   [92m[OKAY][0mstochastic_transformer
....... ....... .  stochastic_transformer[93m[NO][0m[92m[OKAY][0m [92m[OKAY][0m 
.......
. transformer [92m[OKAY][0m transformer[93m[NO][0m
............   .......[93m[NO][0m............  [92m[OKAY][0m....... 
 [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
stochastic_transformer stochastic_transformer . .[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report--------------------------------------------------

----------------------------------------------------------------------------------------------------
DeepSpeed C++/CUDA extension op report
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------JIT compiled ops requires ninja


--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja----------------------------------------------------------------------------------------------------


JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report

--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
ninjaninjaninjaninja   .................. ....................................  .................. [92m[OKAY][0m[92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m


----------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------

op name
op name op nameop name................    ................................................installed    installedinstalledinstalled ....   compatible.. .. 
compatible compatiblecompatible
--------------------------------------------------

----------------------------------------------------------------------------------------------------
--------------------------------------------------


cpu_adamcpu_adam  cpu_adam...............cpu_adam  ............... [92m[YES][0m...............  ............... [92m[YES][0m[92m[YES][0m......   [92m[YES][0m ......[92m[OKAY][0m......  ...... 
[92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m

fused_adam fused_adam.............  fused_adam.............[93m[NO][0m fused_adam  ............. [93m[NO][0m....... .............  .......[92m[OKAY][0m[93m[NO][0m  
 [93m[NO][0m[92m[OKAY][0m....... 
 fused_lamb[92m[OKAY][0m....... 
 .............fused_lamb[92m[OKAY][0m  fused_lamb[93m[NO][0m
.............   ....................[93m[NO][0m   fused_lamb[93m[NO][0m.......[92m[OKAY][0m   .......
.............[92m[OKAY][0m  
[92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0msparse_attnsparse_attn   ...............................   [92m[OKAY][0msparse_attn[93m[NO][0m[93m[NO][0m
   .......................... transformer [93m[NO][0m [92m[OKAY][0m  [92m[OKAY][0m
...................
  [92m[OKAY][0m[93m[NO][0m
 transformer.......transformertransformer    [92m[OKAY][0m....................................  
 [93m[NO][0m[93m[NO][0m [93m[NO][0m ..............  stochastic_transformer....... [92m[OKAY][0m  [92m[OKAY][0m
[92m[OKAY][0m.

 [93m[NO][0m stochastic_transformer....... stochastic_transformer stochastic_transformer [92m[OKAY][0m .
 ..[93m[NO][0m   [93m[NO][0m[93m[NO][0m.......   ..............[92m[OKAY][0m  
[92m[OKAY][0m[92m[OKAY][0m

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... async_io[93m[NO][0m  ......................  [93m[NO][0m[93m[NO][0m 
....... [93m[NO][0m
transformer_inference ..transformer_inference  [93m[NO][0m..  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
utils .................. utils[92m[YES][0m  ........................  [92m[YES][0m[92m[OKAY][0m 
...... [92m[OKAY][0m
quantizer quantizer..............  ..............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed info deepspeed info...................  ...................0.4.2+bc17042, bc17042, big-science 
0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. deepspeed wheel compiled w.......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch version ....................torch version  1.8.1....................
 1.8.1
torch cuda version ...............torch cuda version  11.1...............
 11.1nvcc version
 nvcc version.....................  .....................11.2 
11.2deepspeed install path
 deepspeed install path...........  ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed info deepspeed info...................  ...................0.4.2+bc17042, bc17042, big-science 
0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. deepspeed wheel compiled w.......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.1
nvcc version ..................... 11.111.2

nvcc versiondeepspeed install path  ................................  11.2
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed install path
 deepspeed info...........  ...................['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
0.4.2+bc17042, bc17042, big-sciencedeepspeed info
 deepspeed wheel compiled w....................  ......0.4.2+bc17042, bc17042, big-science 
torch 1.8, cuda 11.1deepspeed wheel compiled w.
 ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... [93m[NO][0masync_io .......  ...............[93m[NO][0m 
[93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0mtransformer_inference  ....... ..[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... utils[92m[OKAY][0m 
.................. [92m[YES][0mquantizer  ....................  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
quantizer .............. --------------------------------------------------[93m[NO][0m
 ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inference ..transformer_inference  [93m[NO][0m..  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
utils ..................utils [92m[YES][0m  ........................  [92m[YES][0m[92m[OKAY][0m 
...... [92m[OKAY][0m
quantizer .............. [93m[NO][0mquantizer  .....................  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m--------------------------------------------------

--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
DeepSpeed general environment info:
DeepSpeed general environment info:torch install path ...............
 torch install path ............... torch install path ...............['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch version
 ....................['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 
torch version1.8.1 
....................torch version torch cuda version 1.8.1 ....................
...............  1.8.1torch cuda version11.1
 
...............nvcc versiontorch cuda version  11.1 ...............
..................... nvcc version 11.1 11.2
.....................
nvcc version deepspeed install path 11.2 .....................
........... deepspeed install path 11.2 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
...........deepspeed install path
  deepspeed info........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ...................
 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info0.4.2+bc17042, bc17042, big-science
 
deepspeed info...................deepspeed wheel compiled w.   .........................0.4.2+bc17042, bc17042, big-science  
0.4.2+bc17042, bc17042, big-sciencetorch 1.8, cuda 11.1deepspeed wheel compiled w.

 deepspeed wheel compiled w.......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:DeepSpeed general environment info:

torch install path torch install path...............  ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.1
1.8.1
torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

--------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op report

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
------------------------------------------------------------------------------------------------------------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report

--------------------------------------------------
--------------------------------------------------

--------------------------------------------------JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

--------------------------------------------------JIT compiled ops requires ninja

JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
----------------------------------------------------------------------------------------------------
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report


----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report

--------------------------------------------------
JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------

DeepSpeed C++/CUDA extension op report

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja--------------------------------------------------

JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninjaninjaninjaninja    ......................................................  .................. [92m[OKAY][0m[92m[OKAY][0m 
[92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
----------------------------------------------------------------------------------------------------

--------------------------------------------------
op nameop name
  op nameop name ................................  ................................ installed  installed installedinstalled ..  .. ..compatible..
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


   --------------------------------------------------compatiblecompatiblecompatible


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


----------------------------------------------------------------------------------------------------
--------------------------------------------------

op nameop name op nameop name  ................................   ................installed................installed    installed.. installed....    compatiblecompatible
..compatible
--------------------------------------------------

-------------------------------------------------- --------------------------------------------------compatible


--------------------------------------------------
cpu_adam ............... [92m[YES][0mcpu_adam cpu_adam......  cpu_adam[92m[OKAY][0m ............... 
cpu_adam ............... [92m[YES][0mcpu_adam cpu_adamcpu_adam......    ...............[92m[OKAY][0m...............  ...............
............... ............... [92m[YES][0m [92m[YES][0m [92m[YES][0m ...... ............   [92m[OKAY][0mfused_adam[92m[OKAY][0m[92m[OKAY][0m
 

 [92m[YES][0m[92m[YES][0m[92m[YES][0m   ............ ......[92m[OKAY][0m  
[92m[OKAY][0m[92m[OKAY][0mfused_adam

............. [93m[NO][0m ....... [92m[OKAY][0m
 ............. [93m[NO][0m ....... [92m[OKAY][0mfused_adam
fused_adamfused_lamb fused_adam............. fused_adam .............   .............[93m[NO][0m.............[93m[NO][0m    .......[93m[NO][0m[93m[NO][0m.......    [92m[OKAY][0m
.......[92m[OKAY][0m 
.......[92m[OKAY][0mfused_lamb
  [92m[OKAY][0m.............
 fused_adam.............fused_adam   ..........................fused_lamb[93m[NO][0m    [93m[NO][0m.......[93m[NO][0m.............    .......[92m[OKAY][0m.......
[93m[NO][0m   [92m[OKAY][0m[92m[OKAY][0mfused_lamb.......

 fused_lamb[93m[NO][0m  ....................fused_lamb  sparse_attn [92m[OKAY][0m[93m[NO][0m.............  
  .............fused_lamb [92m[OKAY][0mfused_lamb[93m[NO][0m
............  .......[93m[NO][0m[93m[NO][0m   [92m[OKAY][0m..............
   .......................... ....... [93m[NO][0m [93m[NO][0m [92m[OKAY][0m .......
  [92m[OKAY][0m[92m[OKAY][0m

sparse_attntransformer  ........................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0msparse_attn[92m[OKAY][0m

 .......[92m[OKAY][0m 
[92m[OKAY][0msparse_attn
 ............ [93m[NO][0m ....... [92m[OKAY][0m
 transformersparse_attn............ stochastic_transformer............   ............  [93m[NO][0m.[93m[NO][0m [93m[NO][0m [93m[NO][0m   ............................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m

sparse_attn transformersparse_attn ............ ............ sparse_attn............  [93m[NO][0m[93m[NO][0m  ............ [93m[NO][0m....... .......   [93m[NO][0m[92m[OKAY][0m.......[92m[OKAY][0m 
 

[92m[OKAY][0m....... transformer
[92m[OKAY][0mstochastic_transformer 
transformerstochastic_transformer  transformer............ . ............ [93m[NO][0m [93m[NO][0m [93m[NO][0m ..............   .......[92m[OKAY][0m[92m[OKAY][0m
 
[92m[OKAY][0m
 ............transformer  [93m[NO][0m.transformer  ............ [93m[NO][0m ....... ............ ....... [93m[NO][0m [92m[OKAY][0m [92m[OKAY][0m[93m[NO][0m
.......
stochastic_transformer .stochastic_transformer  [93m[NO][0m ........  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
  stochastic_transformer.......[92m[OKAY][0m  
[92m[OKAY][0m.
 stochastic_transformer[93m[NO][0m  stochastic_transformer....... . [92m[OKAY][0m [93m[NO][0m.
  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

DeepSpeed general environment info:
torch install pathDeepSpeed general environment info: ............... 
torch install path['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 
............... torch version .................... 1.8.1
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch cuda version ...............torch version  11.1....................
 1.8.1nvcc version
 ..................... torch cuda version11.2 
...............deepspeed install path  11.1...........
 nvcc version ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'].....................
 11.2deepspeed info
 deepspeed install path...................  ...........0.4.2+bc17042, bc17042, big-science 
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed wheel compiled w.
 ......deepspeed info  torch 1.8, cuda 11.1...................
 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
------------------------------------------------------------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report


DeepSpeed C++/CUDA extension op report------------------------------------------------------------------------------------------------------------------------------------------------------


--------------------------------------------------DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.----------------------------------------------------------------------------------------------------


--------------------------------------------------JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


JIT compiled ops requires ninja
--------------------------------------------------
JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninjaninjaninja   ninja......................................................  ..................   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m


----------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------


op name op name................op nameop name    ................................................installed    installed..installedinstalled    ..compatible.... 
 compatible--------------------------------------------------compatible 


compatible----------------------------------------------------------------------------------------------------


--------------------------------------------------
cpu_adam ............... [92m[YES][0m cpu_adam......  cpu_adam...............[92m[OKAY][0m  cpu_adam
[92m[YES][0m...............   .....................[92m[YES][0m   [92m[OKAY][0m......[92m[YES][0m
  [92m[OKAY][0mfused_adam......
  ............. [92m[OKAY][0m[93m[NO][0mfused_adam
  .................... fused_adam [92m[OKAY][0m [93m[NO][0m
.............  .......[93m[NO][0m  [92m[OKAY][0mfused_lamb.......
fused_adam  .............fused_lamb  [92m[OKAY][0m ..........................
[93m[NO][0m   [93m[NO][0mfused_lamb.......[93m[NO][0m    ..............[92m[OKAY][0m............. 
  [92m[OKAY][0m[93m[NO][0m
 [92m[OKAY][0m....... 
[92m[OKAY][0m
fused_lamb .............sparse_attn  [93m[NO][0m............  sparse_attn[93m[NO][0m.......   ...................sparse_attn   [92m[OKAY][0m[93m[NO][0m............[92m[OKAY][0m
  
[93m[NO][0m.......  transformer.......[92m[OKAY][0m  
............[92m[OKAY][0m 
[93m[NO][0mtransformer  transformer...................sparse_attn    ............[92m[OKAY][0m[93m[NO][0m 
............[93m[NO][0m   ....... [93m[NO][0m.......stochastic_transformer[92m[OKAY][0m   
[92m[OKAY][0m.......
. stochastic_transformer [92m[OKAY][0mstochastic_transformer [93m[NO][0m 
 .........   [93m[NO][0m[93m[NO][0m[92m[OKAY][0m transformer.......  
............ ....... [92m[OKAY][0m [93m[NO][0m
[92m[OKAY][0m 
....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op report----------------------------------------------------------------------------------------------------


--------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

DeepSpeed C++/CUDA extension op report--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------
JIT compiled ops requires ninja

--------------------------------------------------
JIT compiled ops requires ninja--------------------------------------------------


JIT compiled ops requires ninjaJIT compiled ops requires ninja

ninjaninjaninjaninja    ...................................................... ..................  [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m

[92m[OKAY][0m

------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------
op name
op nameop name   ................op name................................    installedinstalled................installed    ......installed  compatible  compatiblecompatible..


 ------------------------------------------------------------------------------------------------------------------------------------------------------compatible


--------------------------------------------------
cpu_adam cpu_adam...............cpu_adam  cpu_adam[92m[YES][0m ...............   .....................[92m[YES][0m...............    [92m[OKAY][0m[92m[YES][0m......[92m[YES][0m
   [92m[OKAY][0m............
  [92m[OKAY][0m
[92m[OKAY][0m
fused_adam ............. [93m[NO][0mfused_adam  ....................  [92m[OKAY][0mfused_adamfused_adam[93m[NO][0m
   ....................  .............fused_lamb[92m[OKAY][0m[93m[NO][0m 
 [93m[NO][0m ............. ....... fused_lamb.......  [93m[NO][0m [92m[OKAY][0m .............[92m[OKAY][0m.......
 
 [93m[NO][0m[92m[OKAY][0m 
.......fused_lamb fused_lamb [92m[OKAY][0m .............
.............  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0msparse_attn[92m[OKAY][0m
 
............ [93m[NO][0m .......sparse_attn  [92m[OKAY][0m............
 [93m[NO][0m transformer.......  ............[92m[OKAY][0m sparse_attnsparse_attn
 [93m[NO][0m  ...............................  transformer [93m[NO][0m [93m[NO][0m[92m[OKAY][0m ............ 
....... ....... [93m[NO][0m  [92m[OKAY][0m[92m[OKAY][0m.......
stochastic_transformer
  [92m[OKAY][0mtransformer
. transformer ............[93m[NO][0m  stochastic_transformer [93m[NO][0m............  ....... . .......[93m[NO][0m [92m[OKAY][0m  [93m[NO][0m
[92m[OKAY][0m .......
.......  [92m[OKAY][0m[92m[OKAY][0mstochastic_transformer

 . [93m[NO][0mstochastic_transformer  ....... .[92m[OKAY][0m
 [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
----------------------------------------------------------------------------------------------------
JIT compiled ops requires ninja--------------------------------------------------


--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report
----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninja
JIT compiled ops requires ninja
JIT compiled ops requires ninja
------------------------------------------------------------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report

DeepSpeed C++/CUDA extension op report----------------------------------------------------------------------------------------------------


--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.----------------------------------------------------------------------------------------------------


--------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja


JIT compiled ops requires ninja
ninjaninjaninjaninja    ........................................................................   [92m[OKAY][0m[92m[OKAY][0m 
[92m[OKAY][0m
[92m[OKAY][0m--------------------------------------------------

--------------------------------------------------

----------------------------------------------------------------------------------------------------op name

op name op name op name................ ................ ................   ................installedinstalled installed  installed ..  ....compatible  ..compatible

compatible --------------------------------------------------
--------------------------------------------------compatible

--------------------------------------------------

--------------------------------------------------
cpu_adam cpu_adam...............cpu_adam  cpu_adam...............[92m[YES][0m    ...............[92m[YES][0m...... ...............  ......[92m[YES][0m [92m[OKAY][0m  [92m[YES][0m
[92m[OKAY][0m...... 
......  [92m[OKAY][0m[92m[OKAY][0m

fused_adam ............. fused_adam[93m[NO][0m fused_adamfused_adam ....................    .............[92m[OKAY][0m[93m[NO][0m............. 
  [93m[NO][0m.......[93m[NO][0m fused_lamb  ....... [92m[OKAY][0m....... .............
ninjaninjaninjaninja    ........................................................................  [92m[OKAY][0m  
 [92m[OKAY][0m [92m[OKAY][0mfused_lamb

[93m[NO][0m  .................... fused_lambfused_lamb  [93m[NO][0m[92m[OKAY][0m.............  
[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m
--------------------------------------------------


----------------------------------------------------------------------------------------------------
op name--------------------------------------------------
 ....................[93m[NO][0m  [92m[OKAY][0m [93m[NO][0m
.......  .......[92m[OKAY][0m 
 op name
[92m[OKAY][0m
 ................op nameop name................    installed................installed................    installed.... installed   ..compatiblecompatible ..

compatible ----------------------------------------------------------------------------------------------------compatible

sparse_attn ............ sparse_attnsparse_attn[93m[NO][0msparse_attn   ............ ...................  ............ [93m[NO][0m[93m[NO][0m [92m[OKAY][0m  


--------------------------------------------------
--------------------------------------------------
[93m[NO][0m..............   .......transformer[92m[OKAY][0m [92m[OKAY][0m 

............[92m[OKAY][0m transformer
cpu_adam ...............cpu_adam cpu_adam [92m[YES][0mcpu_adam ...............   .....................[92m[YES][0m...............   [92m[YES][0m [92m[OKAY][0m[92m[YES][0m ......
transformer[93m[NO][0m   ...............................transformer    [93m[NO][0m[93m[NO][0m............[92m[OKAY][0m   
......   [92m[OKAY][0m......[92m[OKAY][0m

 [92m[OKAY][0m
..............[93m[NO][0m   [92m[OKAY][0m[92m[OKAY][0m.......
stochastic_transformer
  [92m[OKAY][0m
fused_adam ............. [93m[NO][0m .......fused_adam fused_adam fused_adam[92m[OKAY][0m .............
.stochastic_transformer stochastic_transformer [93m[NO][0m stochastic_transformer . ....... . [93m[NO][0m . [92m[OKAY][0m[93m[NO][0m.......
   [93m[NO][0m[92m[OKAY][0m....... 
.............   [93m[NO][0m.............fused_lamb[93m[NO][0m    .......[93m[NO][0m....................    .......[92m[OKAY][0m[92m[OKAY][0m[93m[NO][0m 

 [92m[OKAY][0m.......
 fused_lamb[92m[OKAY][0m fused_lambfused_lamb.............
 .......[92m[OKAY][0m 
[92m[OKAY][0m
   [93m[NO][0m..........................   .......[93m[NO][0m[93m[NO][0m  [92m[OKAY][0m....... 
 .......sparse_attn[92m[OKAY][0m  
[92m[OKAY][0m............
 [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0msparse_attn sparse_attn  ...............................  sparse_attn [92m[OKAY][0m[93m[NO][0m [93m[NO][0m
 ............ ....... ....... stochastic_transformer[93m[NO][0m  [92m[OKAY][0m[92m[OKAY][0m 
.
.......  transformer[92m[OKAY][0m[93m[NO][0mtransformer
   ...................transformer............    [92m[OKAY][0m[93m[NO][0m
[93m[NO][0m............   ..............[93m[NO][0m  [92m[OKAY][0m [92m[OKAY][0m

....... [92m[OKAY][0m
stochastic_transformer stochastic_transformer . stochastic_transformer.[93m[NO][0m   [93m[NO][0m........   [92m[OKAY][0m.......[93m[NO][0m
  [92m[OKAY][0m.......
 [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
----------------------------------------------------------------------------------------------------
DeepSpeed C++/CUDA extension op report

DeepSpeed C++/CUDA extension op report--------------------------------------------------

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

JIT compiled ops requires ninja--------------------------------------------------

--------------------------------------------------JIT compiled ops requires ninja

DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utils ..................utils  [92m[YES][0m..................  ......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
quantizer ..............quantizer  [93m[NO][0m..............  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
----------------------------------------------------------------------------------------------------

ninjaninjaninjaninja    ...................................................... ..................   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


----------------------------------------------------------------------------------------------------
--------------------------------------------------
--------------------------------------------------
op name
op name op name ................ op name................ ................ installed  installed installed ..................  .. ..compatible installed 
compatible compatible--------------------------------------------------..

--------------------------------------------------
-------------------------------------------------- 

compatible
--------------------------------------------------
cpu_adam ...............cpu_adam  cpu_adam[92m[YES][0m...............   ...............cpu_adam......[92m[YES][0m   [92m[YES][0m............... [92m[OKAY][0m  ......
......[92m[YES][0m  [92m[OKAY][0m [92m[OKAY][0m
......
 [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... fused_adam[92m[OKAY][0mfused_adam 
fused_adam ..........................  fused_lamb .............[93m[NO][0m[93m[NO][0m  [93m[NO][0m .............   .....................[93m[NO][0m    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m.......

 
[92m[OKAY][0mfused_lamb
fused_lamb fused_lamb ............. ............. ............. [93m[NO][0m  [93m[NO][0m[93m[NO][0m.......   .......sparse_attn.......[92m[OKAY][0m   
[92m[OKAY][0m............[92m[OKAY][0m
 
[93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... sparse_attn[92m[OKAY][0msparse_attn sparse_attn
  .................................... stochastic_transformer [93m[NO][0m [93m[NO][0m   [93m[NO][0m....... ........ .......  [92m[OKAY][0m[93m[NO][0m [92m[OKAY][0m
[92m[OKAY][0m 

.......transformer  transformer............transformer [92m[OKAY][0m  ............
[93m[NO][0m ............ [93m[NO][0m ....... [93m[NO][0m ....... [92m[OKAY][0m .......
[92m[OKAY][0m 
[92m[OKAY][0m
stochastic_transformerstochastic_transformer stochastic_transformer  .. . [93m[NO][0m [93m[NO][0m [93m[NO][0m.......   ..............[92m[OKAY][0m  
[92m[OKAY][0m[92m[OKAY][0m

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inference .. [93m[NO][0m transformer_inference.......  ..[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... utils[92m[OKAY][0m 
.................. quantizer[92m[YES][0m  ....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
quantizer .............. --------------------------------------------------[93m[NO][0m
 ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op report--------------------------------------------------

--------------------------------------------------DeepSpeed C++/CUDA extension op report


DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------
--------------------------------------------------

--------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op report

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja--------------------------------------------------


--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------

JIT compiled ops requires ninjaJIT compiled ops requires ninja

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... async_io[93m[NO][0m  ......................  [93m[NO][0m[93m[NO][0m
 ....... [93m[NO][0m
transformer_inference .. transformer_inference[93m[NO][0m  .........  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
utils ..................utils  [92m[YES][0m..................  ......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
quantizer .............. [93m[NO][0mquantizer  .....................  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m utils.......  ..................[92m[OKAY][0m 
[92m[YES][0m ...... [92m[OKAY][0mutils
 .................. quantizer[92m[YES][0m  ....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
quantizer .............. --------------------------------------------------[93m[NO][0m
 ....... [92m[OKAY][0m
--------------------------------------------------
ninjaninjaninjaninja    ........................................................................   [92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
--------------------------------------------------
----------------------------------------------------------------------------------------------------
op name

 op name................op name  op name................ installed ................  installed ................installed..   .. installed..compatible   
compatiblecompatible--------------------------------------------------..


 ----------------------------------------------------------------------------------------------------compatible


--------------------------------------------------
cpu_adam ............... [92m[YES][0m ......cpu_adam cpu_adamcpu_adam[92m[OKAY][0m   ..............................
...............  [92m[YES][0m [92m[YES][0m [92m[YES][0m ......  ............ [92m[OKAY][0m fused_adam
[92m[OKAY][0m[92m[OKAY][0m 

............. [93m[NO][0m ....... [92m[OKAY][0m
fused_adamfused_lambfused_adam  fused_adam ............. ............. .......................... [93m[NO][0m  [93m[NO][0m  [93m[NO][0m.......[93m[NO][0m.......    .......[92m[OKAY][0m[92m[OKAY][0m....... 

 [92m[OKAY][0m[92m[OKAY][0m

fused_lamb ............. fused_lambfused_lamb[93m[NO][0m   .................................  [93m[NO][0m sparse_attn [92m[OKAY][0m [93m[NO][0m.......
............   .......[93m[NO][0m[92m[OKAY][0m  
.......[92m[OKAY][0m 
[92m[OKAY][0m
transformer ............ [93m[NO][0m sparse_attn.......  ............[92m[OKAY][0m 
sparse_attn[93m[NO][0msparse_attn   stochastic_transformer...............................    [93m[NO][0m.[92m[OKAY][0m 
[93m[NO][0m ....... [93m[NO][0m  .......transformer[92m[OKAY][0m.......  
............[92m[OKAY][0m  transformer
[92m[OKAY][0m [93m[NO][0m
transformer ...................   ............[92m[OKAY][0m [93m[NO][0m
[93m[NO][0m  .............. stochastic_transformer[92m[OKAY][0m  
[92m[OKAY][0m
. [93m[NO][0m stochastic_transformer.......stochastic_transformer  [92m[OKAY][0m 
..  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op report

--------------------------------------------------
DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

----------------------------------------------------------------------------------------------------
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja


DeepSpeed C++/CUDA extension op report--------------------------------------------------

--------------------------------------------------
JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja   .................. .................................... .................. [92m[OKAY][0m[92m[OKAY][0m  

[92m[OKAY][0m[92m[OKAY][0m----------------------------------------------------------------------------------------------------


op name-------------------------------------------------- --------------------------------------------------op name
................
  op name................op nameinstalled    ..................................installed    installedcompatible.. installed
  --------------------------------------------------....compatible 
 
compatiblecompatible

--------------------------------------------------
----------------------------------------------------------------------------------------------------cpu_adam

 ............... [92m[YES][0m ...... [92m[OKAY][0m
cpu_adam ...............cpu_adamcpu_adam   [92m[YES][0m..............................   ......[92m[YES][0m[92m[YES][0m  fused_adam ...... [92m[OKAY][0m...... 
.............[92m[OKAY][0m  
[92m[OKAY][0m[93m[NO][0m 
....... [92m[OKAY][0m
fused_adamfused_lamb  .............fused_adam.............   fused_adam[93m[NO][0m[93m[NO][0m.............    ....................[93m[NO][0m ....... [92m[OKAY][0m  
[93m[NO][0m.......[92m[OKAY][0m  
.......[92m[OKAY][0m 
[92m[OKAY][0m
fused_lamb .............fused_lamb fused_lamb [93m[NO][0m............. sparse_attn   [93m[NO][0m................................    .......[93m[NO][0m[93m[NO][0m[92m[OKAY][0m   
.......[92m[OKAY][0m.......
  [92m[OKAY][0m[92m[OKAY][0m

transformer ............ [93m[NO][0m sparse_attn.......  sparse_attn............[92m[OKAY][0m 
 ............sparse_attn[93m[NO][0m  stochastic_transformer [93m[NO][0m............  ....... ........ [93m[NO][0m  [92m[OKAY][0m[93m[NO][0m[92m[OKAY][0m 
 
..............  transformertransformer [92m[OKAY][0m[92m[OKAY][0m
 ............
............  [93m[NO][0mtransformer [93m[NO][0m ....... ............ .......  [92m[OKAY][0m[93m[NO][0m[92m[OKAY][0m
 
....... [92m[OKAY][0m
stochastic_transformer stochastic_transformer .stochastic_transformer.   [93m[NO][0m[93m[NO][0m.   .......[93m[NO][0m.......   .......[92m[OKAY][0m[92m[OKAY][0m
 
[92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io async_io...............  ...............[93m[NO][0m  [93m[NO][0m.......  .......[93m[NO][0m 
[93m[NO][0m
transformer_inference transformer_inference..  ..[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
utils .................. [92m[YES][0mutils  ........................  [92m[OKAY][0m[92m[YES][0m
 ...... quantizer[92m[OKAY][0m 
.............. [93m[NO][0m .......quantizer  [92m[OKAY][0m..............
 [93m[NO][0m ....... --------------------------------------------------[92m[OKAY][0m

--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m
[92m[OKAY][0m
quantizer ..............quantizer  [93m[NO][0m .....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']DeepSpeed general environment info:
torch version
 .................... 1.8.1
torch install pathtorch cuda version  ..............................  11.1
nvcc version ..................... 11.2['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

deepspeed install path ...........torch version  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']....................
 deepspeed info1.8.1 
................... 0.4.2+bc17042, bc17042, big-sciencetorch cuda version
 ...............deepspeed wheel compiled w.  ......11.1 
torch 1.8, cuda 11.1nvcc version
 ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

DeepSpeed C++/CUDA extension op report----------------------------------------------------------------------------------------------------


JIT compiled ops requires ninja--------------------------------------------------DeepSpeed C++/CUDA extension op report


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

--------------------------------------------------
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja


DeepSpeed C++/CUDA extension op report--------------------------------------------------

--------------------------------------------------JIT compiled ops requires ninja

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja    ........................................................................   [92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m[92m[OKAY][0m
--------------------------------------------------

--------------------------------------------------

--------------------------------------------------op name--------------------------------------------------
op name 
 ................op name................op name    installedinstalled................................   .. ..installed installed compatible  
..compatible..-------------------------------------------------- 
 compatible
--------------------------------------------------compatible

--------------------------------------------------

--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0mcpu_adam
cpu_adamcpu_adam   .............................. ............... [92m[YES][0m [92m[YES][0m [92m[YES][0m......  fused_adam ......  ......[92m[OKAY][0m[92m[OKAY][0m .............

 [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
fused_lambfused_adam fused_adam ..........................  fused_adam .............[93m[NO][0m  [93m[NO][0m .................... [93m[NO][0m  ....... [93m[NO][0m[92m[OKAY][0m .......
 [92m[OKAY][0m 
.......[92m[OKAY][0m 
fused_lamb[92m[OKAY][0m
 fused_lamb.............  fused_lamb.............[93m[NO][0m   .............sparse_attn[93m[NO][0m.......    [92m[OKAY][0m...................[93m[NO][0m
   [93m[NO][0m....... [92m[OKAY][0m....... 
 [92m[OKAY][0m[92m[OKAY][0m

transformer ............sparse_attn  [93m[NO][0m............  .......[93m[NO][0m sparse_attn [92m[OKAY][0m.......
 sparse_attn ............[92m[OKAY][0m  
stochastic_transformer[93m[NO][0m............   transformer........  [93m[NO][0m ............[93m[NO][0m [92m[OKAY][0m  .......[93m[NO][0m.......
   transformer.......[92m[OKAY][0m [92m[OKAY][0m
 ............
[92m[OKAY][0m transformer
[93m[NO][0m  ...................  [93m[NO][0m[92m[OKAY][0m stochastic_transformer
.......  [92m[OKAY][0m.
stochastic_transformer  [93m[NO][0m stochastic_transformer........   [92m[OKAY][0m[93m[NO][0m.
  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science
0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w.
 ......deepspeed wheel compiled w.  torch 1.8, cuda 11.1......
 torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op report

--------------------------------------------------DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
DeepSpeed C++/CUDA extension op report--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja

----------------------------------------------------------------------------------------------------
JIT compiled ops requires ninja

DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1DeepSpeed general environment info:

torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

ninjaninjaninjaninja    ......................................................   ..................[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m 


nvcc versionnvcc version  ..........................................  11.211.2

[92m[OKAY][0m------------------------------------------------------------------------------------------------------------------------------------------------------


--------------------------------------------------op nameop nameop name 
deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science
  ................................op name ................   installedinstalledinstalled ................  .... .. installed  compatiblecompatible ..compatible

 
----------------------------------------------------------------------------------------------------compatible

--------------------------------------------------


deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

--------------------------------------------------
cpu_adamcpu_adamcpu_adam  ............... cpu_adam...............  [92m[YES][0m...............  ............... [92m[YES][0m[92m[YES][0m  ............  [92m[YES][0m [92m[OKAY][0m......[92m[OKAY][0m 
 
......[92m[OKAY][0m 
[92m[OKAY][0m
fused_adamfused_adam  .............fused_adam.............   [93m[NO][0m fused_adam....................[93m[NO][0m    [92m[OKAY][0m[93m[NO][0m
....................   fused_lamb[92m[OKAY][0m.......[93m[NO][0m 
.............   [92m[OKAY][0mfused_lamb[93m[NO][0m.......
   .......[92m[OKAY][0m............. 
 fused_lamb[93m[NO][0m  [92m[OKAY][0mfused_lamb
....................   .............[93m[NO][0m[92m[OKAY][0m 
[93m[NO][0m .......  .......[92m[OKAY][0m 
[92m[OKAY][0m
sparse_attn ............ [93m[NO][0m sparse_attn.......  ............[92m[OKAY][0m 
[93m[NO][0m sparse_attn.......transformer sparse_attn  ............[92m[OKAY][0m ............ 
............ [93m[NO][0m  transformer[93m[NO][0m[93m[NO][0m .......  ................... .......  [92m[OKAY][0m[92m[OKAY][0m [93m[NO][0m

[92m[OKAY][0m 
.......transformer transformer[92m[OKAY][0m  ............
stochastic_transformer............   [93m[NO][0mstochastic_transformer[93m[NO][0m .  ....... ........ [93m[NO][0m   [92m[OKAY][0m[93m[NO][0m.......[92m[OKAY][0m  

[92m[OKAY][0m.......
 [92m[OKAY][0mstochastic_transformerstochastic_transformer
  ..  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io async_io...............  [93m[NO][0m...............  .......[93m[NO][0m  [93m[NO][0m
....... [93m[NO][0m
transformer_inference .. [93m[NO][0m transformer_inference.......  ..[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ......utils  [92m[OKAY][0m..................
 [92m[YES][0m ...... quantizer[92m[OKAY][0m 
.............. [93m[NO][0m ....... [92m[OKAY][0mquantizer
 .............. [93m[NO][0m .......-------------------------------------------------- 
[92m[OKAY][0m
--------------------------------------------------
----------------------------------------------------------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report
--------------------------------------------------

DeepSpeed C++/CUDA extension op report--------------------------------------------------

--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------DeepSpeed C++/CUDA extension op report


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
--------------------------------------------------
JIT compiled ops requires ninja


--------------------------------------------------JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja

--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------


op nameop nameop name  op name................................    ................................installed installed  installed.. installed  ....  compatible compatible
..compatible
--------------------------------------------------

 ----------------------------------------------------------------------------------------------------
compatible

--------------------------------------------------
cpu_adam ............... [92m[YES][0m ......cpu_adam cpu_adam [92m[OKAY][0m ...............cpu_adam............... 
  ...............[92m[YES][0m[92m[YES][0m   [92m[YES][0m............   ......[92m[OKAY][0m[92m[OKAY][0m
fused_adam 
 [92m[OKAY][0m............. 
[93m[NO][0m .......fused_adam  [92m[OKAY][0m.............
fused_adam  .............fused_adam  [93m[NO][0m[93m[NO][0m fused_lamb............. .......   ....................[93m[NO][0m[92m[OKAY][0m  
 [92m[OKAY][0m[93m[NO][0m.......
 fused_lamb ....... [92m[OKAY][0m .............fused_lamb
[92m[OKAY][0m  
.............fused_lamb[93m[NO][0m   [93m[NO][0m....... .............  .......[92m[OKAY][0m[93m[NO][0m
  sparse_attn[92m[OKAY][0m....... 
 ............[92m[OKAY][0m [93m[NO][0m
 ....... [92m[OKAY][0m
sparse_attntransformer  ........................ sparse_attn [93m[NO][0m[93m[NO][0m sparse_attn  ..........................    ............[92m[OKAY][0m[93m[NO][0m[92m[OKAY][0m 
 
[93m[NO][0m.......stochastic_transformer   .......[92m[OKAY][0mtransformer . 
[92m[OKAY][0m ............
[93m[NO][0m transformer[93m[NO][0m transformer.......    ............................... [92m[OKAY][0m 
 [92m[OKAY][0m[93m[NO][0m[93m[NO][0m
  ..............  [92m[OKAY][0m[92m[OKAY][0mstochastic_transformer

 . stochastic_transformer[93m[NO][0m stochastic_transformer  ........  .[93m[NO][0m[92m[OKAY][0m  
[93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.async_io ...............
 [93m[NO][0m ....... [93m[NO][0m
async_iotransformer_inference  .................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[92m[OKAY][0m

utils .................. [92m[YES][0m ......transformer_inference  [92m[OKAY][0m..
 [93m[NO][0m ....... quantizer[92m[OKAY][0m 
.............. [93m[NO][0m ....... utils[92m[OKAY][0m 
.................. [92m[YES][0m --------------------------------------------------......
 [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
----------------------------------------------------------------------------------------------------
DeepSpeed C++/CUDA extension op report

DeepSpeed C++/CUDA extension op report--------------------------------------------------

----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report--------------------------------------------------


------------------------------------------------------------------------------------------------------------------------------------------------------JIT compiled ops requires ninja


DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja

--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

ninjaninjaninja  ninja ......................................................   [92m[OKAY][0m.................. [92m[OKAY][0m [92m[OKAY][0m

[92m[OKAY][0m
--------------------------------------------------

------------------------------------------------------------------------------------------------------------------------------------------------------op name

 
................op nameop name op name  installed ................ ................................ ..   installedinstalledinstalledcompatible  
.. ..-------------------------------------------------- .. 
 compatiblecompatiblecompatible


----------------------------------------------------------------------------------------------------
--------------------------------------------------

cpu_adam ............... [92m[YES][0m ...... cpu_adam[92m[OKAY][0m cpu_adamcpu_adam...............
   ...............[92m[YES][0m...............   ......[92m[YES][0m[92m[YES][0m   [92m[OKAY][0m......
...... fused_adam [92m[OKAY][0m [92m[OKAY][0m
.............
 [93m[NO][0m fused_adam.......  .............[92m[OKAY][0m 
[93m[NO][0m fused_adam.......fused_lamb  [92m[OKAY][0m fused_adam.............
.............   [93m[NO][0m.............[93m[NO][0m fused_lamb .......   .......[92m[OKAY][0m[93m[NO][0m.............
   [92m[OKAY][0m.......[93m[NO][0m
  .......[92m[OKAY][0m 
[92m[OKAY][0mfused_lamb
 fused_lamb.............  .............sparse_attn[93m[NO][0m   [93m[NO][0m...................  [93m[NO][0m .......  [92m[OKAY][0m.......[92m[OKAY][0msparse_attn 

[92m[OKAY][0m ............ [93m[NO][0m ....... [92m[OKAY][0m

transformertransformer  ........................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0msparse_attn[92m[OKAY][0msparse_attn 

 ........................  [93m[NO][0mstochastic_transformer [93m[NO][0mstochastic_transformer .......   ........[92m[OKAY][0m. 
  [93m[NO][0m[92m[OKAY][0m[93m[NO][0mtransformer  
....... .......  [92m[OKAY][0mtransformer[92m[OKAY][0m
............ 
 ............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
stochastic_transformerstochastic_transformer  ..  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:
DeepSpeed general environment info:
torch install path
torch install path  ..............................  torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch version 
 ........................................  1.8.1torch version1.8.1
 
....................torch cuda versiontorch cuda version   1.8.1..............................
  11.111.1
torch cuda version
nvcc version nvcc version ............... ..................... ..................... 11.1 11.2
11.2

nvcc versiondeepspeed install pathdeepspeed install path   ...........................................   11.2
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed install path

 deepspeed info...........deepspeed info   ...................................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 0.4.2+bc17042, bc17042, big-science
0.4.2+bc17042, bc17042, big-science

deepspeed infodeepspeed wheel compiled w. deepspeed wheel compiled w. ................... ...... ......0.4.2+bc17042, bc17042, big-science  
torch 1.8, cuda 11.1torch 1.8, cuda 11.1
deepspeed wheel compiled w.
 ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed info deepspeed info...................  ...................0.4.2+bc17042, bc17042, big-science 
0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w.
 deepspeed wheel compiled w.......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io async_io...............  [93m[NO][0m...............  ....... [93m[NO][0m[93m[NO][0m 
....... [93m[NO][0m
transformer_inference ..transformer_inference  [93m[NO][0m..  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
utils utils..................  ..................[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
quantizer ..............quantizer  [93m[NO][0m..............  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ...................DeepSpeed general environment info: 0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w. ...... torch install pathtorch 1.8, cuda 11.1 
............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path DeepSpeed general environment info:........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed info ................... torch install path0.4.2+bc17042, bc17042, big-science 
...............deepspeed wheel compiled w.  ...... torch 1.8, cuda 11.1
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... async_io[93m[NO][0m  ......................  [93m[NO][0m[93m[NO][0m 
....... [93m[NO][0m
transformer_inference .. [93m[NO][0mtransformer_inference  ....... ..[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ......utils  [92m[OKAY][0m..................
 [92m[YES][0m quantizer......  ..............[92m[OKAY][0m
 [93m[NO][0m .......quantizer  [92m[OKAY][0m..............
 [93m[NO][0m ....... [92m[OKAY][0m--------------------------------------------------

--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference ..utils  [93m[NO][0m..................  .......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
quantizerutils  ................................  [93m[NO][0m[92m[YES][0m  .............  [92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.utils
 .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0masync_io
 ............... --------------------------------------------------[93m[NO][0m
 ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
DeepSpeed general environment info:torch install path 
............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version ....................['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 
1.8.1
torch version torch cuda version....................  ...............1.8.1 
11.1
torch cuda versionnvcc version  ....................................  11.111.2

nvcc versiondeepspeed install path  ................................  11.2
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed install path
 deepspeed info...........  ................... 0.4.2+bc17042, bc17042, big-science['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed wheel compiled w.deepspeed info  .........................  torch 1.8, cuda 11.10.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninjaJIT compiled ops requires ninja

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------DeepSpeed C++/CUDA extension op report

JIT compiled ops requires ninja--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
ninjaninjaninjaninja    ...................................................... ..................  [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m

[92m[OKAY][0m
--------------------------------------------------

------------------------------------------------------------------------------------------------------------------------------------------------------


op nameop nameop name op name ................   ................................installed ................ installed  .. installedinstalled ..  compatible ....
 compatible --------------------------------------------------compatible
compatible


------------------------------------------------------------------------------------------------------------------------------------------------------


cpu_adam ............... cpu_adamcpu_adamcpu_adam[92m[YES][0m   .............................. ......   [92m[YES][0m[92m[YES][0m...............[92m[OKAY][0m   
............[92m[YES][0m   [92m[OKAY][0m[92m[OKAY][0m......

 [92m[OKAY][0m
fused_adam ............. fused_adamfused_adam[93m[NO][0m fused_adam  .................................    .............[93m[NO][0m[93m[NO][0m[92m[OKAY][0m   
.......[93m[NO][0m.......   [92m[OKAY][0mfused_lamb
.......[92m[OKAY][0m  
.............[92m[OKAY][0mfused_lamb 
fused_lamb[93m[NO][0m   .................................fused_lamb    [93m[NO][0m[92m[OKAY][0m[93m[NO][0m
.............   ..............[93m[NO][0m   [92m[OKAY][0m[92m[OKAY][0m.......

 [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m sparse_attn....... sparse_attnsparse_attn  [92m[OKAY][0m ........................
............   [93m[NO][0m[93m[NO][0m[93m[NO][0mtransformer    .................................    [92m[OKAY][0m[93m[NO][0m[92m[OKAY][0m[92m[OKAY][0m
 
.......
transformer transformer[92m[OKAY][0mtransformer
   ....................................   [93m[NO][0mstochastic_transformer[93m[NO][0m [93m[NO][0m   .....................   [92m[OKAY][0m.[92m[OKAY][0m[92m[OKAY][0m

 
[93m[NO][0m stochastic_transformer.......  stochastic_transformerstochastic_transformer[92m[OKAY][0m .
  [93m[NO][0m.  ........ [93m[NO][0m [93m[NO][0m [92m[OKAY][0m .......
.......  [92m[OKAY][0m[92m[OKAY][0m

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
/bin/sh: line 0: type: git: not found
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
using world size: 256, data-parallel-size: 8, tensor-model-parallel size: 4, pipeline-model-parallel size: 8 
using torch.float16 for parameters ...
------------------------ arguments ------------------------
  accumulate_allreduce_grads_in_fp32 .............. False
  adam_beta1 ...................................... 0.9
  adam_beta2 ...................................... 0.999
  adam_eps ........................................ 1e-08
  adlr_autoresume ................................. False
  adlr_autoresume_interval ........................ 1000
  apply_query_key_layer_scaling ................... True
  apply_residual_connection_post_layernorm ........ False
  attention_dropout ............................... 0.1
  attention_softmax_in_fp32 ....................... False
  bert_binary_head ................................ True
  bert_load ....................................... None
  bf16 ............................................ False
  bias_dropout_fusion ............................. True
  bias_gelu_fusion ................................ True
  biencoder_projection_dim ........................ 0
  biencoder_shared_query_context_model ............ False
  block_data_path ................................. None
  checkpoint_activations .......................... True
  checkpoint_in_cpu ............................... False
  checkpoint_num_layers ........................... 1
  clip_grad ....................................... 1.0
  codecarbon_dir .................................. /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/tr8-104B-data/codecarbon
  consumed_train_samples .......................... 0
  consumed_valid_samples .......................... 0
  contigious_checkpointing ........................ False
  cpu_optimizer ................................... False
  cpu_torch_adam .................................. False
  data_impl ....................................... mmap
  data_parallel_size .............................. 8
  data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document']
  dataloader_type ................................. single
  DDP_impl ........................................ local
  decoder_seq_length .............................. None
  deepscale ....................................... False
  deepscale_config ................................ None
  deepspeed ....................................... True
  deepspeed_activation_checkpointing .............. True
  deepspeed_config ................................ ./ds_config.1161730.json
  deepspeed_mpi ................................... False
  distribute_checkpointed_activations ............. False
  distributed_backend ............................. nccl
  embedding_path .................................. None
  encoder_seq_length .............................. 2048
  eod_mask_loss ................................... False
  eval_interval ................................... 1000
  eval_iters ...................................... 5
  evidence_data_path .............................. None
  exit_duration_in_mins ........................... 110
  exit_interval ................................... None
  ffn_hidden_size ................................. 20480
  finetune ........................................ False
  fp16 ............................................ True
  fp16_lm_cross_entropy ........................... False
  fp32_residual_connection ........................ False
  global_batch_size ............................... 2048
  hidden_dropout .................................. 0.1
  hidden_size ..................................... 16384
  hysteresis ...................................... 2
  ict_head_size ................................... None
  ict_load ........................................ None
  img_dim ......................................... 224
  indexer_batch_size .............................. 128
  indexer_log_interval ............................ 1000
  init_method_std ................................. 0.02
  init_method_xavier_uniform ...................... False
  initial_loss_scale .............................. 4294967296
  kv_channels ..................................... 512
  layernorm_epsilon ............................... 1e-05
  lazy_mpu_init ................................... None
  load ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints
  local_rank ...................................... 0
  log_batch_size_to_tensorboard ................... True
  log_interval .................................... 1
  log_learning_rate_to_tensorboard ................ True
  log_loss_scale_to_tensorboard ................... True
  log_num_zeros_in_grad ........................... False
  log_params_norm ................................. False
  log_timers_to_tensorboard ....................... True
  log_validation_ppl_to_tensorboard ............... True
  loss_scale ...................................... 12.0
  loss_scale_window ............................... 1000
  lr .............................................. 6e-05
  lr_decay_iters .................................. None
  lr_decay_samples ................................ 126953125
  lr_decay_style .................................. cosine
  lr_warmup_fraction .............................. None
  lr_warmup_iters ................................. 0
  lr_warmup_samples ............................... 216320
  make_vocab_size_divisible_by .................... 128
  mask_prob ....................................... 0.15
  masked_softmax_fusion ........................... True
  max_position_embeddings ......................... 2048
  memory_centric_tiled_linear ..................... False
  merge_file ...................................... /gpfswork/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/data/gpt2-merges.txt
  micro_batch_size ................................ 1
  min_loss_scale .................................. 1.0
  min_lr .......................................... 6e-06
  mmap_warmup ..................................... False
  no_load_optim ................................... None
  no_load_rng ..................................... None
  no_save_optim ................................... None
  no_save_rng ..................................... None
  num_attention_heads ............................. 32
  num_channels .................................... 3
  num_classes ..................................... 1000
  num_layers ...................................... 32
  num_layers_per_virtual_pipeline_stage ........... None
  num_workers ..................................... 2
  onnx_safe ....................................... None
  openai_gelu ..................................... False
  optimizer ....................................... adam
  override_lr_scheduler ........................... False
  params_dtype .................................... torch.float16
  partition_activations ........................... False
  patch_dim ....................................... 16
  pipeline_model_parallel_size .................... 8
  position_embedding_type ......................... PositionEmbeddingType.absolute
  profile_backward ................................ False
  query_in_block_prob ............................. 0.1
  rampup_batch_size ............................... ['16', '16', '6_000_000']
  rank ............................................ 0
  remote_device ................................... none
  reset_attention_mask ............................ False
  reset_position_ids .............................. False
  retriever_report_topk_accuracies ................ []
  retriever_score_scaling ......................... False
  retriever_seq_length ............................ 256
  sample_rate ..................................... 1.0
  save ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints
  save_interval ................................... 1500
  scatter_gather_tensors_in_pipeline .............. True
  scattered_embeddings ............................ False
  seed ............................................ 42
  seq_length ...................................... 2048
  sgd_momentum .................................... 0.9
  short_seq_prob .................................. 0.1
  split ........................................... 949,50,1
  split_transformers .............................. False
  synchronize_each_layer .......................... False
  tensor_model_parallel_size ...................... 4
  tensorboard_dir ................................. /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/tr8-104B-data/tensorboard
  tensorboard_log_interval ........................ 1
  tensorboard_queue_size .......................... 5
  tile_factor ..................................... 1
  titles_data_path ................................ None
  tokenizer_name_or_path .......................... None
  tokenizer_type .................................. GPT2BPETokenizer
  train_iters ..................................... None
  train_samples ................................... 300000000
  use_checkpoint_lr_scheduler ..................... False
  use_contiguous_buffers_in_ddp ................... False
  use_cpu_initialization .......................... None
  use_one_sent_docs ............................... False
  use_pin_memory .................................. False
  virtual_pipeline_model_parallel_size ............ None
  vocab_extra_ids ................................. 0
  vocab_file ...................................... /gpfswork/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/data/gpt2-vocab.json
  weight_decay .................................... 0.1
  world_size ...................................... 256
  zero_allgather_bucket_size ...................... 0.0
  zero_contigious_gradients ....................... False
  zero_reduce_bucket_size ......................... 0.0
  zero_reduce_scatter ............................. False
  zero_stage ...................................... 1
-------------------- end of arguments ---------------------
will use batch size rampup starting from global batch size 16 to global batch size 2048 with batch size increments 16 over 6000000 samples.
> building GPT2BPETokenizer tokenizer ...
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... DeepSpeed general environment info:['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info
 ................... 0.4.2+bc17042, bc17042, big-science
torch install pathdeepspeed wheel compiled w.  .....................  torch 1.8, cuda 11.1
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

------------------------------------------------------------------------------------------------------------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report


------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------
JIT compiled ops requires ninja
JIT compiled ops requires ninja
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
DeepSpeed C++/CUDA extension op report


----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja

--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


op nameop nameop name   op name................................  ................ installed................ installed  installed.. installed  .. .. compatible.. compatible
 compatible--------------------------------------------------compatible


----------------------------------------------------------------------------------------------------
--------------------------------------------------

cpu_adam ...............cpu_adam cpu_adamcpu_adam[92m[YES][0m    ...................................................   [92m[YES][0m[92m[YES][0m  [92m[OKAY][0m [92m[YES][0m......
......   ......[92m[OKAY][0m[92m[OKAY][0m 
[92m[OKAY][0m

fused_adam ............. [93m[NO][0m fused_adamfused_adam.......fused_adam   ............. ............. .............[92m[OKAY][0m [93m[NO][0m [93m[NO][0m[93m[NO][0m
   .......fused_lamb..............    [92m[OKAY][0m.............[92m[OKAY][0m
[92m[OKAY][0m
 
[93m[NO][0mfused_lamb  .......fused_lamb............. fused_lamb [92m[OKAY][0m  .............
[93m[NO][0m.............   [93m[NO][0m.......[93m[NO][0m   ..............[92m[OKAY][0m  
[92m[OKAY][0m[92m[OKAY][0m

sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
sparse_attntransformersparse_attnsparse_attn   ........................ [93m[NO][0m  ...............................    [93m[NO][0m[92m[OKAY][0m[93m[NO][0m[93m[NO][0m 
 ....... ....... transformer.......[92m[OKAY][0m   
[92m[OKAY][0m............[92m[OKAY][0m

 stochastic_transformer[93m[NO][0m  transformer.......transformer.   ............ ............ [93m[NO][0m[92m[OKAY][0m  
[93m[NO][0m[93m[NO][0m.......   .......stochastic_transformer[92m[OKAY][0m.......  
 [92m[OKAY][0m.
[92m[OKAY][0m 
[93m[NO][0m .......stochastic_transformer  [92m[OKAY][0mstochastic_transformer
 . [93m[NO][0m.  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
----------------------------------------------------------------------------------------------------
JIT compiled ops requires ninja

DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

ninja ninja..................  ..................[92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
op name op name................  ................ installedinstalled  ....  compatiblecompatible

----------------------------------------------------------------------------------------------------

cpu_adamcpu_adam  ..............................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m
[92m[OKAY][0m
fused_adamfused_adam  ..........................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

fused_lambfused_lamb  ..........................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

sparse_attnsparse_attn  ............ ............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m
 [92m[OKAY][0m
transformertransformer ............  ............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m
 [92m[OKAY][0m
stochastic_transformer stochastic_transformer . [93m[NO][0m. .......  [93m[NO][0m[92m[OKAY][0m
 ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

------------------------------------------------------------------------------------------------------------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report--------------------------------------------------


------------------------------------------------------------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op report
JIT compiled ops requires ninjaJIT compiled ops requires ninja
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja--------------------------------------------------

JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninjaninjaninjaninja   ..................  ....................................[92m[OKAY][0m..................  
 [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
----------------------------------------------------------------------------------------------------

--------------------------------------------------
op name
op name  op nameop name................................   ................ ................installed   installedinstalledinstalled..    ..compatible.... 
  --------------------------------------------------compatiblecompatiblecompatible


----------------------------------------------------------------------------------------------------

--------------------------------------------------
cpu_adam ............... [92m[YES][0m cpu_adam......cpu_adam   ...............cpu_adam[92m[OKAY][0m...............  
 [92m[YES][0m[92m[YES][0m ............... ............  [92m[OKAY][0m [92m[OKAY][0m
[92m[YES][0m
fused_adam  ...................  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0mfused_adam
fused_adam  ..........................fused_lamb   .............[93m[NO][0m[93m[NO][0m   [93m[NO][0m....... ....... ....... fused_adam[92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m

 .............fused_lamb fused_lamb [93m[NO][0m ............. ............. .......[93m[NO][0m  sparse_attn [93m[NO][0m[92m[OKAY][0m ....... ............ 
[92m[OKAY][0m.......  
[93m[NO][0mfused_lamb[92m[OKAY][0m  
....... .............[92m[OKAY][0m 
[93m[NO][0m .......transformer  sparse_attn............[92m[OKAY][0m  sparse_attn............[93m[NO][0m
   ...................[93m[NO][0m  [93m[NO][0m [92m[OKAY][0m .......
.......  [92m[OKAY][0m[92m[OKAY][0m
stochastic_transformer
 transformertransformer . sparse_attn ............ [93m[NO][0m ........................[93m[NO][0m   [93m[NO][0m..............    .......[92m[OKAY][0m[93m[NO][0m[92m[OKAY][0m
  
[92m[OKAY][0m
stochastic_transformer.......stochastic_transformer  [92m[OKAY][0m .
 .[93m[NO][0m  [93m[NO][0mtransformer.......  .......[92m[OKAY][0m 
 [92m[OKAY][0m............ 
[93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
------------------------------------------------------------------------------------------------------------------------------------------------------


DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

--------------------------------------------------
----------------------------------------------------------------------------------------------------


--------------------------------------------------DeepSpeed C++/CUDA extension op report
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
----------------------------------------------------------------------------------------------------


--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------JIT compiled ops requires ninja


JIT compiled ops requires ninja
JIT compiled ops requires ninja--------------------------------------------------


JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
ninjaninjaninjaninja    ........................................................................   [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------

op name
op name op name ................op name  ................ installed ................installed  .................. ..   installedcompatiblecompatibleinstalled 

 ..----------------------------------------------------------------------------------------------------..
 
 compatiblecompatible

----------------------------------------------------------------------------------------------------

cpu_adamcpu_adam  .............................. cpu_adam [92m[YES][0mcpu_adam [92m[YES][0m  ...... .............................. ......[92m[OKAY][0m   
[92m[YES][0m[92m[YES][0m[92m[OKAY][0m  
............  [92m[OKAY][0m[92m[OKAY][0m

fused_adam ............. [93m[NO][0m fused_adam.......  fused_adamfused_adam .............[92m[OKAY][0m.............  .............[93m[NO][0m
   [93m[NO][0m....... [93m[NO][0m .......fused_lamb [92m[OKAY][0m  .......
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[92m[OKAY][0m.............  
[92m[OKAY][0mfused_lamb[93m[NO][0m
  fused_lamb....................   .............fused_lamb[93m[NO][0m[92m[OKAY][0m   [93m[NO][0m.......
.............   [92m[OKAY][0m.......[93m[NO][0m
  [92m[OKAY][0m.......
 [92m[OKAY][0m
sparse_attn ............ sparse_attn[93m[NO][0m  ...................  sparse_attn[92m[OKAY][0msparse_attn[93m[NO][0m 
  ........................transformer.......    [93m[NO][0m[92m[OKAY][0m............[93m[NO][0m  
 [93m[NO][0m.............. transformer.......   [92m[OKAY][0m ............[92m[OKAY][0m 
[92m[OKAY][0m[93m[NO][0m

 transformer.......transformer   stochastic_transformer[92m[OKAY][0m........................  
 [93m[NO][0m[93m[NO][0m . stochastic_transformer ....... [93m[NO][0m.......   [92m[OKAY][0m. .......[92m[OKAY][0m 
[93m[NO][0m[92m[OKAY][0m
 
....... [92m[OKAY][0m
stochastic_transformerstochastic_transformer  ..  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

------------------------------------------------------------------------------------------------------------------------------------------------------


DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report


----------------------------------------------------------------------------------------------------
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
----------------------------------------------------------------------------------------------------
--------------------------------------------------

JIT compiled ops requires ninja--------------------------------------------------

DeepSpeed C++/CUDA extension op report
JIT compiled ops requires ninja

JIT compiled ops requires ninja--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninjaninja ninja..................  ninja..................  ....................................[92m[OKAY][0m   [92m[OKAY][0m
[92m[OKAY][0m
[92m[OKAY][0m--------------------------------------------------
--------------------------------------------------


----------------------------------------------------------------------------------------------------op nameop name
 
 op name................................  op nameinstalled ................   installed..installed   ....................compatible   
compatiblecompatible
installed
-------------------------------------------------- ----------------------------------------------------------------------------------------------------
..
 compatible

--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
cpu_adam ............... cpu_adam[92m[YES][0m  .....................cpu_adam   [92m[YES][0m[92m[OKAY][0m............... 
......  cpu_adam[92m[YES][0m[92m[OKAY][0m 
async_io ............... [93m[NO][0m ....... [93m[NO][0m
fused_adam......   ............................[92m[OKAY][0m 
[93m[NO][0m  .......[92m[YES][0m  [92m[OKAY][0mfused_adam
......  .............[92m[OKAY][0m fused_lamb
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
[93m[NO][0m fused_adam ..........................   .......[93m[NO][0m [93m[NO][0m [92m[OKAY][0m ..............
fused_lambfused_adam  [92m[OKAY][0m  
[92m[OKAY][0m............. 
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
............. [93m[NO][0m fused_lamb.......[93m[NO][0m   ....................[92m[OKAY][0m 
 sparse_attn[93m[NO][0m[92m[OKAY][0m 
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
 ............ .......fused_lamb[93m[NO][0m   [92m[OKAY][0m.................... 
--------------------------------------------------
 [92m[OKAY][0m[93m[NO][0m
sparse_attn ............ [93m[NO][0mtransformer   ..........................   [93m[NO][0m[92m[OKAY][0m [92m[OKAY][0m
.......
sparse_attn [92m[OKAY][0m transformer
 ........................  stochastic_transformer [93m[NO][0m[93m[NO][0m.   .............. [93m[NO][0m [92m[OKAY][0msparse_attn 
[92m[OKAY][0m
 transformer.......stochastic_transformer............    [92m[OKAY][0m............[93m[NO][0m
 . [93m[NO][0m .......  [92m[OKAY][0m.......[93m[NO][0m  [92m[OKAY][0m.......

 [92m[OKAY][0mtransformer stochastic_transformer............  
. [93m[NO][0m[93m[NO][0m  ....... .......[92m[OKAY][0m 
[92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
 > padded vocab (size: 50257) with 431 dummy tokens (new size: 50688)
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0masync_io
 ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m transformer_inference.......  ..[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0mutils
 .................. [92m[YES][0m ...... [92m[OKAY][0mutils
 .................. [92m[YES][0m quantizer......  ..............[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m-------------------------------------------------- 
....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
DeepSpeed C++/CUDA extension op report--------------------------------------------------

--------------------------------------------------JIT compiled ops requires ninja

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------------------------------------------------------------------


--------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report


----------------------------------------------------------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op report


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------


JIT compiled ops requires ninjaJIT compiled ops requires ninjaJIT compiled ops requires ninja
--------------------------------------------------


JIT compiled ops requires ninja
ninjaninjaninjaninja   ......................................................   [92m[OKAY][0m[92m[OKAY][0m..................

  [92m[OKAY][0m----------------------------------------------------------------------------------------------------[92m[OKAY][0m


--------------------------------------------------op nameop name
  --------------------------------------------------................op name................
   ................installedop nameinstalled    ..installed..................   compatiblecompatible ..

installed --------------------------------------------------compatible 

.. --------------------------------------------------compatible

--------------------------------------------------
--------------------------------------------------
cpu_adam cpu_adam...............  [92m[YES][0mcpu_adam...............   ......[92m[YES][0m ............... [92m[OKAY][0m ......
[92m[YES][0m  [92m[OKAY][0mcpu_adam
......  [92m[OKAY][0m
...............fused_adam  [92m[YES][0m.............fused_adam  [93m[NO][0mfused_adam  ....................  [92m[OKAY][0m[93m[NO][0m......
   ....................  fused_lamb[93m[NO][0m[92m[OKAY][0m  
....................[92m[OKAY][0m  fused_lamb[93m[NO][0m[92m[OKAY][0m  

....................  fused_lamb[92m[OKAY][0m[93m[NO][0m
  ....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m .......sparse_attn ............ sparse_attn[93m[NO][0msparse_attn    ...............................  [92m[OKAY][0m [93m[NO][0m
[93m[NO][0m[92m[OKAY][0m  .......transformer  
...................[92m[OKAY][0m  
fused_lamb[92m[OKAY][0m [93m[NO][0mtransformer 
....................   ............[93m[NO][0m[92m[OKAY][0mtransformer 
  .......[93m[NO][0m............  .......stochastic_transformer[93m[NO][0m    [92m[OKAY][0m........[92m[OKAY][0m
 
 [92m[OKAY][0m[93m[NO][0m
 stochastic_transformer.......  stochastic_transformer[92m[OKAY][0m 
. .[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninjaninjaninjaninja   .................. ......................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


----------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------


op nameop name op nameop name ................  ................  ................................installedinstalled    installed..installed..   .... compatible  compatible
compatiblecompatible

--------------------------------------------------
----------------------------------------------------------------------------------------------------
--------------------------------------------------


cpu_adamcpu_adamcpu_adam   ...............cpu_adam..............................   [92m[YES][0m ...............[92m[YES][0m [92m[YES][0m ......  ......[92m[YES][0m  ...... [92m[OKAY][0m......[92m[OKAY][0m
 
 [92m[OKAY][0m[92m[OKAY][0m

fused_adam .............fused_adam fused_adam fused_adam[93m[NO][0m ............. .............  .......[93m[NO][0m .............  [93m[NO][0m[92m[OKAY][0m ....... 
 [93m[NO][0m.......[92m[OKAY][0m  
fused_lamb[92m[OKAY][0m....... 
 .............[92m[OKAY][0mfused_lamb 
 [93m[NO][0mfused_lamb............. fused_lamb  ....... .............[93m[NO][0m .............   [92m[OKAY][0m[93m[NO][0m.......
[93m[NO][0m  ....... ....... [92m[OKAY][0m [92m[OKAY][0m

[92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
sparse_attnsparse_attn transformersparse_attn............    ............[93m[NO][0m........................   [93m[NO][0m .......[93m[NO][0m[93m[NO][0m  [92m[OKAY][0m  
.....................   [92m[OKAY][0mtransformer[92m[OKAY][0m
[92m[OKAY][0m
 ............
transformer [93m[NO][0mstochastic_transformer   ............transformer....... . [93m[NO][0m  ............ [92m[OKAY][0m [93m[NO][0m[93m[NO][0m
 ....... ....... ....... [92m[OKAY][0m stochastic_transformer
[92m[OKAY][0m[92m[OKAY][0m
 
stochastic_transformer.  [93m[NO][0mstochastic_transformer . .......  [92m[OKAY][0m.[93m[NO][0m
  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... async_io[93m[NO][0m  ......................  [93m[NO][0m[93m[NO][0m
 ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0mtransformer_inference
 .. [93m[NO][0m ....... utils[92m[OKAY][0m 
.................. [92m[YES][0m ...... [92m[OKAY][0mutils
 .................. [92m[YES][0m quantizer......  [92m[OKAY][0m.............. 
[93m[NO][0m ....... quantizer[92m[OKAY][0m 
.............. [93m[NO][0m .......--------------------------------------------------
 [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. .............. [93m[NO][0m
 ....... [92m[OKAY][0m
--------------------------------------------------
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... async_io[93m[NO][0m  ......................  [93m[NO][0m[93m[NO][0m
 ....... [93m[NO][0m
transformer_inference .. [93m[NO][0mtransformer_inference  .........  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
utils utils..................  ..................[92m[YES][0m  [92m[YES][0m ............  [92m[OKAY][0m[92m[OKAY][0m

quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------


----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report

--------------------------------------------------
JIT compiled ops requires ninjaJIT compiled ops requires ninja
--------------------------------------------------


DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja

--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
ninjaninjaninjaninja   .................. ......................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


----------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------
op name

 op name................ op nameop name................    ................installed................installed    installedinstalled..  .... ..  compatiblecompatible compatible

compatible
----------------------------------------------------------------------------------------------------
--------------------------------------------------


--------------------------------------------------
cpu_adamcpu_adam cpu_adam ............... cpu_adam .............................. [92m[YES][0m  [92m[YES][0m[92m[YES][0m  ..................... ............  [92m[OKAY][0m  [92m[YES][0m
[92m[OKAY][0m[92m[OKAY][0m 

...... [92m[OKAY][0m
fused_adam fused_adamfused_adam.............   .............fused_adam .............[93m[NO][0m ............. [93m[NO][0m   [93m[NO][0m.......[93m[NO][0m.......    [92m[OKAY][0m.......[92m[OKAY][0m.......

  [92m[OKAY][0m[92m[OKAY][0m
fused_lamb
fused_lamb fused_lamb ............. .............fused_lamb  .............[93m[NO][0m[93m[NO][0m  ............. [93m[NO][0m  ..............[93m[NO][0m   [92m[OKAY][0m [92m[OKAY][0m..............

  [92m[OKAY][0m[92m[OKAY][0m

sparse_attnsparse_attnsparse_attn  ............ sparse_attn........................  ............  [93m[NO][0m  [93m[NO][0m[93m[NO][0m[93m[NO][0m.......    ..............[92m[OKAY][0m.......  
 [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m

transformer
transformer transformer ............ ............transformer............    [93m[NO][0m[93m[NO][0m............[93m[NO][0m   ....... .......[93m[NO][0m .......[92m[OKAY][0m   
.......[92m[OKAY][0m[92m[OKAY][0m 

[92m[OKAY][0mstochastic_transformer
 stochastic_transformerstochastic_transformer . stochastic_transformer.   [93m[NO][0m[93m[NO][0m. . .......  ....... [93m[NO][0m [93m[NO][0m[92m[OKAY][0m
[92m[OKAY][0m  
..............  [92m[OKAY][0m[92m[OKAY][0m

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.transformer_inference .. 
[93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0masync_io ......  ...............[92m[OKAY][0m 
async_io[93m[NO][0m  ......................  quantizer[93m[NO][0m[93m[NO][0m 
 .....................  [93m[NO][0m[93m[NO][0m 
....... [92m[OKAY][0m
--------------------------------------------------transformer_inference
 .. [93m[NO][0m ....... [92m[OKAY][0mtransformer_inference
 .. [93m[NO][0m utils.......  ..................[92m[OKAY][0m 
[92m[YES][0m ...... [92m[OKAY][0m
utils .................. [92m[YES][0mquantizer  ....................  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m-------------------------------------------------- 
....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2
deepspeed install path
 deepspeed install path...........  ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info
 deepspeed info...................  ...................0.4.2+bc17042, bc17042, big-science 
0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
DeepSpeed general environment info:
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
DeepSpeed general environment info:deepspeed wheel compiled w. ...... 
torch 1.8, cuda 11.1
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... [93m[NO][0m .......async_io [93m[NO][0m 
............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... utils[92m[OKAY][0m 
.................. utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer[92m[YES][0m  ....................  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
--------------------------------------------------
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m transformer_inference.......  ..[93m[NO][0m 
[93m[NO][0m ....... [92m[OKAY][0m
utils .................. transformer_inference[92m[YES][0m  ........  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m utils.......  ..................[92m[OKAY][0m 
[92m[YES][0m ......-------------------------------------------------- 
[92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

JIT compiled ops requires ninja----------------------------------------------------------------------------------------------------


JIT compiled ops requires ninja
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:DeepSpeed general environment info:

torch install path ...............torch install path  ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version ....................torch version  1.8.1....................
 1.8.1
torch cuda version ...............torch cuda version  ...............11.1 
11.1nvcc version
 nvcc version.....................  .....................11.2 
11.2deepspeed install path
 deepspeed install path...........  ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info
 deepspeed info...................  ...................0.4.2+bc17042, bc17042, big-science 
0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w.
 deepspeed wheel compiled w.......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------


op nameop name op nameop name ................  ................................ ................  installedinstalled installed  installed ......    ..compatiblecompatiblecompatible 


compatible------------------------------------------------------------------------------------------------------------------------------------------------------


--------------------------------------------------
cpu_adamcpu_adam cpu_adam ...............cpu_adam  ...............[92m[YES][0m...............    .....................[92m[YES][0m[92m[YES][0m    [92m[OKAY][0m......[92m[YES][0m......
  [92m[OKAY][0m ......
[92m[OKAY][0m 
[92m[OKAY][0m
fused_adam ............. fused_adam[93m[NO][0m  fused_adamfused_adam.............  ....... ............. ............. [93m[NO][0m[92m[OKAY][0m  
[93m[NO][0m[93m[NO][0m.......   fused_lamb.......[92m[OKAY][0m.......  
............. [92m[OKAY][0m [92m[OKAY][0m[93m[NO][0m
fused_lamb 
 .................... fused_lamb fused_lamb[92m[OKAY][0m[93m[NO][0m  
............. ............. ....... [93m[NO][0m [93m[NO][0m [92m[OKAY][0m .......
.......  [92m[OKAY][0m[92m[OKAY][0m

sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............transformer  [93m[NO][0m............ sparse_attnsparse_attn ....... [93m[NO][0m  ............ ............[92m[OKAY][0m .......
 [93m[NO][0m [93m[NO][0m [92m[OKAY][0m transformer
..............   ............[92m[OKAY][0m[92m[OKAY][0m [93m[NO][0mstochastic_transformer 

 ....... [92m[OKAY][0m.
transformer transformer [93m[NO][0m ............stochastic_transformer ...................    [93m[NO][0m[92m[OKAY][0m[93m[NO][0m
 . ....... ....... [93m[NO][0m [92m[OKAY][0m [92m[OKAY][0m.......

 [92m[OKAY][0m
stochastic_transformerstochastic_transformer  ..  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
DeepSpeed general environment info:deepspeed wheel compiled w. ...... 
torch 1.8, cuda 11.1
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
torch cuda version ............... 11.1
nvcc version ..................... 11.2
--------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op report

--------------------------------------------------DeepSpeed C++/CUDA extension op report--------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report--------------------------------------------------


----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
      meet the required dependencies to JIT install the op.--------------------------------------------------


--------------------------------------------------JIT compiled ops requires ninja

JIT compiled ops requires ninja
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
> setting codecarbon ...
ninjaninjaninjaninja    ........................................................................   [92m[OKAY][0m[92m[OKAY][0m 
[92m[OKAY][0m
----------------------------------------------------------------------------------------------------[92m[OKAY][0m


op nameop name----------------------------------------------------------------------------------------------------  

................................op name  op nameinstalled installed   ....................................    installedinstalledcompatiblecompatible  
..
..-------------------------------------------------- --------------------------------------------------
 compatible
compatible

----------------------------------------------------------------------------------------------------

cpu_adam ............... cpu_adam[92m[YES][0m  .....................cpu_adam  cpu_adam [92m[YES][0m[92m[OKAY][0m............... 
  ......[92m[YES][0m...............   [92m[OKAY][0m......[92m[YES][0m
  [92m[OKAY][0m......
fused_adam  [92m[OKAY][0m............. 
fused_adam[93m[NO][0m  ....................  fused_adam[93m[NO][0m[92m[OKAY][0m  
.................... fused_adamfused_lamb [92m[OKAY][0m [93m[NO][0m.............
   .......[93m[NO][0m............. fused_lamb  ....... [92m[OKAY][0m .............[93m[NO][0m[92m[OKAY][0m
 
 [93m[NO][0m .......fused_lamb.......   [92m[OKAY][0m[92m[OKAY][0m.............

 [93m[NO][0m ....... sparse_attn[92m[OKAY][0mfused_lamb 
 ......................... [93m[NO][0m  [93m[NO][0msparse_attn.......   ............[92m[OKAY][0m....... 
sparse_attn [93m[NO][0m transformer[92m[OKAY][0m  ........................
.......   [93m[NO][0m[93m[NO][0m[92m[OKAY][0m  
....... .......[92m[OKAY][0mtransformer
  [92m[OKAY][0m............sparse_attn stochastic_transformer
 [93m[NO][0m ............ . .......transformer   [93m[NO][0m[93m[NO][0m............ [92m[OKAY][0m  
.......[93m[NO][0m.......   [92m[OKAY][0mstochastic_transformer[92m[OKAY][0m
 .......
 .[92m[OKAY][0m transformer
[93m[NO][0m  ...................stochastic_transformer  [92m[OKAY][0m [93m[NO][0m
 ........  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0mstochastic_transformer
 . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inference transformer_inference..  ..[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
utils .................. [92m[YES][0mutils  ........................  [92m[YES][0m[92m[OKAY][0m 
...... [92m[OKAY][0m
quantizer .............. [93m[NO][0mquantizer  .....................  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
/bin/sh: line 0: type: git: not found
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io ...............  ...............[93m[NO][0m  [93m[NO][0m.......  .......[93m[NO][0m 
[93m[NO][0m
transformer_inference .. [93m[NO][0m transformer_inference.......  ..[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0mutils  ........................  [92m[OKAY][0m[92m[YES][0m
 ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0mquantizer  .....................  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m--------------------------------------------------

--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
DeepSpeed general environment info:torch install path
 ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch version
 .................... torch version1.8.1 
.................... torch cuda version 1.8.1...............
 11.1torch cuda version
 nvcc version...............  .....................11.1 
11.2nvcc version
 deepspeed install path.....................  ...........11.2 
deepspeed install path['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ...........
 deepspeed info ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']...................
 deepspeed info0.4.2+bc17042, bc17042, big-science 
...................deepspeed wheel compiled w.  0.4.2+bc17042, bc17042, big-science......
 deepspeed wheel compiled w.torch 1.8, cuda 11.1 
...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op report

--------------------------------------------------
DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report--------------------------------------------------


--------------------------------------------------JIT compiled ops requires ninja--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

JIT compiled ops requires ninja--------------------------------------------------

JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m
--------------------------------------------------

--------------------------------------------------
----------------------------------------------------------------------------------------------------
op name

op name  op name................ ................op name  ................ installedinstalled................   installed .. installed ....compatible   
compatible..compatible--------------------------------------------------
 
--------------------------------------------------
compatible
--------------------------------------------------

--------------------------------------------------
cpu_adamcpu_adam  ...............cpu_adam...............  cpu_adam[92m[YES][0m ...............   [92m[YES][0m...............[92m[YES][0m......   ...... [92m[OKAY][0m[92m[YES][0m ......
  [92m[OKAY][0m......[92m[OKAY][0m

 [92m[OKAY][0m
fused_adam ............. [93m[NO][0m fused_adamfused_adam.......fused_adam   ............. ..........................[92m[OKAY][0m  [93m[NO][0m 
 [93m[NO][0m[93m[NO][0m.......   fused_lamb..............[92m[OKAY][0m  
 .............[92m[OKAY][0m 
[92m[OKAY][0mfused_lamb[93m[NO][0m 
 .............fused_lamb.......   [93m[NO][0m.............[92m[OKAY][0mfused_lamb
   .......[93m[NO][0m.............   .......[93m[NO][0m[92m[OKAY][0m  [92m[OKAY][0m
.......
 [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
sparse_attnsparse_attn transformer sparse_attn........................   [93m[NO][0m ........................ [93m[NO][0m  ....... [93m[NO][0m[93m[NO][0m ....... [92m[OKAY][0m  .......
 .......[92m[OKAY][0m[92m[OKAY][0m 

transformer[92m[OKAY][0m 
............transformer stochastic_transformertransformer [93m[NO][0m  ............  ....................[93m[NO][0m   [93m[NO][0m  [93m[NO][0m[92m[OKAY][0m..............
   .......[92m[OKAY][0m[92m[OKAY][0m 

stochastic_transformer[92m[OKAY][0m 
stochastic_transformer.  stochastic_transformer[93m[NO][0m.   .......[93m[NO][0m . [92m[OKAY][0m .......
[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m .......transformer_inference  [92m[OKAY][0m..
 [93m[NO][0m ....... utils[92m[OKAY][0m 
.................. [92m[YES][0m ...... [92m[OKAY][0mutils
 .................. quantizer[92m[YES][0m  ....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
quantizer ..............-------------------------------------------------- 
[93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io async_io...............  ...............[93m[NO][0m  [93m[NO][0m.......  .......[93m[NO][0m 
[93m[NO][0m
transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utils ..................utils  [92m[YES][0m..................  ......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
quantizer quantizer..............  ..............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inference .. [93m[NO][0mtransformer_inference  .........  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
utils ..................utils  [92m[YES][0m..................  ......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
quantizer .............. quantizer[93m[NO][0m  .....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1DeepSpeed general environment info:
nvcc version .....................
 11.2
deepspeed install path ...........torch install path  ...............['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']deepspeed wheel compiled w.
 ...... torch versiontorch 1.8, cuda 11.1 
.................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

DeepSpeed general environment info:
DeepSpeed general environment info:torch install path
 ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version ....................['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 
1.8.1
torch versiontorch cuda version  ...................................  1.8.111.1

nvcc versiontorch cuda version  ....................................  11.211.1

deepspeed install path nvcc version...........  ..................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']11.2

deepspeed infodeepspeed install path  ..............................  0.4.2+bc17042, bc17042, big-science['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed wheel compiled w.deepspeed info  .........................  torch 1.8, cuda 11.10.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ...............async_io [93m[NO][0m  ......................  [93m[NO][0m[93m[NO][0m 
....... [93m[NO][0m
transformer_inference .. [93m[NO][0mtransformer_inference .......  ..[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m utils......  ..................[92m[OKAY][0m 
[92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. quantizer[93m[NO][0m  .....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

DeepSpeed general environment info:
torch install pathDeepSpeed general environment info: ............... 
torch install path ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']...............
 torch version .................... 1.8.1
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch cuda version ...............torch version  11.1....................
 nvcc version1.8.1 
..................... 11.2torch cuda version
 deepspeed install path...............  ...........11.1 
nvcc version['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] .....................
 deepspeed info11.2 
...................deepspeed install path  0.4.2+bc17042, bc17042, big-science...........
 deepspeed wheel compiled w. ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']......
 deepspeed infotorch 1.8, cuda 11.1 
................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference utils..  ..................[93m[NO][0m  [92m[YES][0m.......  ......[92m[OKAY][0m 
[92m[OKAY][0m
quantizerutils  ................................  [93m[NO][0m[92m[YES][0m  .............  [92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------quantizer
 .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
DeepSpeed general environment info:torch install path 
............... torch install path ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']...............
 torch version ....................['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 1.8.1

torch versiontorch cuda version  ...................................  1.8.111.1

nvcc versiontorch cuda version  ....................................  11.211.1

deepspeed install pathnvcc version  ................................  11.2['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed install pathdeepspeed info  ..............................  0.4.2+bc17042, bc17042, big-science['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed wheel compiled w.deepspeed info  .........................  torch 1.8, cuda 11.10.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install path torch install path...............  ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version torch version....................  ....................1.8.1
 1.8.1
torch cuda version ...............torch cuda version  11.1...............
 nvcc version11.1 
.....................nvcc version  11.2.....................
 deepspeed install path11.2 
...........deepspeed install path  ...........['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
................... deepspeed info 0.4.2+bc17042, bc17042, big-science...................
 deepspeed wheel compiled w.0.4.2+bc17042, bc17042, big-science 
......deepspeed wheel compiled w.  torch 1.8, cuda 11.1......
 torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

--------------------------------------------------
DeepSpeed C++/CUDA extension op report--------------------------------------------------

----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


--------------------------------------------------DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
----------------------------------------------------------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report--------------------------------------------------


JIT compiled ops requires ninja--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
----------------------------------------------------------------------------------------------------
JIT compiled ops requires ninja

--------------------------------------------------DeepSpeed C++/CUDA extension op report

--------------------------------------------------DeepSpeed C++/CUDA extension op report

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.----------------------------------------------------------------------------------------------------


--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report


JIT compiled ops requires ninja----------------------------------------------------------------------------------------------------


JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------
JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
ninjaninjaninjaninja    ......................................................  .................. [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


op nameop nameop nameop name  ................   ................................................installed    installedinstalledinstalled..    ....compatible..
  -------------------------------------------------- compatiblecompatiblecompatible


------------------------------------------------------------------------------------------------------------------------------------------------------


cpu_adam ............... [92m[YES][0m cpu_adam......cpu_adam  cpu_adam ...............[92m[OKAY][0m ...............
...............   [92m[YES][0m[92m[YES][0m[92m[YES][0m  ...... ...... ...... [92m[OKAY][0mfused_adam [92m[OKAY][0m
 [92m[OKAY][0m
.............
 [93m[NO][0m ....... [92m[OKAY][0m
fused_adam fused_lamb............. fused_adam.............   fused_adam............. [93m[NO][0m[93m[NO][0m .............  [93m[NO][0m..............    .......[93m[NO][0m[92m[OKAY][0m[92m[OKAY][0m  
[92m[OKAY][0m
.......
fused_lamb  [92m[OKAY][0m.............
 fused_lamb[93m[NO][0m fused_lamb .............  .......[93m[NO][0m.............sparse_attn   ....... [93m[NO][0m [92m[OKAY][0m............  [92m[OKAY][0m
[93m[NO][0m
.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
transformer ............sparse_attn [93m[NO][0m  sparse_attn...................   ............sparse_attn [93m[NO][0m[92m[OKAY][0m [93m[NO][0m .......
............   .......[92m[OKAY][0mstochastic_transformer [93m[NO][0m[92m[OKAY][0m

  .......transformer.transformer    [93m[NO][0m[92m[OKAY][0m........................ 
  ....... [92m[OKAY][0m
[93m[NO][0mtransformer[93m[NO][0m   ..........................   [92m[OKAY][0m[93m[NO][0m[92m[OKAY][0m
 
....... [92m[OKAY][0m
stochastic_transformerstochastic_transformer  ..stochastic_transformer   [93m[NO][0m[93m[NO][0m ........   [93m[NO][0m.......[92m[OKAY][0m  
.......[92m[OKAY][0m 
[92m[OKAY][0m
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
ninjaninjaninjaninja    ...................................................... ..................  [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m


----------------------------------------------------------------------------------------------------

----------------------------------------------------------------------------------------------------op name
op name
 op name ................op name ................................  installed  installed installed  ......................    compatibleinstalledcompatible 
compatible
..
---------------------------------------------------------------------------------------------------- --------------------------------------------------


compatible
--------------------------------------------------
cpu_adamcpu_adam ...............cpu_adam   cpu_adam[92m[YES][0m............... ............... ......   ...............[92m[YES][0m[92m[OKAY][0m[92m[YES][0m  
 [92m[YES][0m............   ......[92m[OKAY][0m[92m[OKAY][0m 

[92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0mfused_adam
 fused_adam.............fused_adam  fused_lamb .............[93m[NO][0m.............    .................... [93m[NO][0m[92m[OKAY][0m
[93m[NO][0m   [93m[NO][0m..............  fused_lamb [92m[OKAY][0m.......[92m[OKAY][0m  
[92m[OKAY][0m.............

 fused_lamb[93m[NO][0mfused_lamb   .................................   [93m[NO][0m[92m[OKAY][0m[93m[NO][0m
  ..............sparse_attn   [92m[OKAY][0m[92m[OKAY][0m............

 [93m[NO][0m ....... sparse_attn[92m[OKAY][0m 
............ [93m[NO][0m transformer.......  sparse_attn............[92m[OKAY][0msparse_attn  
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

............ [93m[NO][0m ............ transformer[93m[NO][0m  .......[93m[NO][0m.......    ............[92m[OKAY][0m....... [92m[OKAY][0m[92m[OKAY][0m 


[93m[NO][0m .......transformertransformer stochastic_transformer  [92m[OKAY][0m............ 
............ . [93m[NO][0m  [93m[NO][0m[93m[NO][0m.......stochastic_transformer  .......  ....... .[92m[OKAY][0m  
[92m[OKAY][0m
[92m[OKAY][0m[93m[NO][0mstochastic_transformer 
 ....... [92m[OKAY][0m.
stochastic_transformer  [93m[NO][0m ........  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
----------------------------------------------------------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report
--------------------------------------------------

--------------------------------------------------DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
JIT compiled ops requires ninja

--------------------------------------------------
JIT compiled ops requires ninja--------------------------------------------------


JIT compiled ops requires ninjaJIT compiled ops requires ninja

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninjaninjaninjaninja   .................................... ..................  ..................[92m[OKAY][0m[92m[OKAY][0m 
 [92m[OKAY][0m
[92m[OKAY][0m--------------------------------------------------

--------------------------------------------------

----------------------------------------------------------------------------------------------------op name
op name
  op name................................op name    installed................installed................    .. ..installed installedcompatiblecompatible  

....---------------------------------------------------------------------------------------------------- 
 
compatiblecompatible

----------------------------------------------------------------------------------------------------

cpu_adam cpu_adam...............  ...............[92m[YES][0m  [92m[YES][0mcpu_adamcpu_adam......  ......   [92m[OKAY][0m..............................[92m[OKAY][0m  

[92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

fused_adamfused_adam  ..........................  [93m[NO][0m[93m[NO][0m  fused_adam..............fused_adam    [92m[OKAY][0m..........................[92m[OKAY][0m
  
[93m[NO][0m[93m[NO][0m  ..............fused_lamb   fused_lamb[92m[OKAY][0m[92m[OKAY][0m.............
 
 .............[93m[NO][0m  fused_lamb.......[93m[NO][0mfused_lamb   ....... .............[92m[OKAY][0m .............[92m[OKAY][0m  

[93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

sparse_attnsparse_attn  ........................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0msparse_attn
sparse_attn
  ............transformer............   transformer[93m[NO][0m............[93m[NO][0m  [93m[NO][0m  ............ ..............  ....... [92m[OKAY][0m[93m[NO][0m [92m[OKAY][0m
 [92m[OKAY][0m
.......
transformer  [92m[OKAY][0mtransformer............
stochastic_transformer   ............[93m[NO][0m  .[93m[NO][0m .......stochastic_transformer [93m[NO][0m  ....... [92m[OKAY][0m........
   [92m[OKAY][0m[92m[OKAY][0m
[93m[NO][0mstochastic_transformer
  ....... [92m[OKAY][0m.stochastic_transformer
  [93m[NO][0m.  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0masync_io
 ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m .......utils  [92m[OKAY][0m..................
 [92m[YES][0m ...... [92m[OKAY][0mutils
 .................. [92m[YES][0m ......quantizer  [92m[OKAY][0m..............
 [93m[NO][0m .......quantizer  [92m[OKAY][0m..............
 [93m[NO][0m ....... --------------------------------------------------[92m[OKAY][0m

--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utils ..................utils  [92m[YES][0m..................  ......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
quantizer quantizer..............  ..............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m .......async_io [93m[NO][0m 
............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m utils.......  [92m[OKAY][0m..................
 [92m[YES][0m ...... [92m[OKAY][0m
utils quantizer..................  ..............[92m[YES][0m  [93m[NO][0m......  .......[92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------quantizer
 .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utils utils..................  ..................[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:DeepSpeed general environment info:

torch install path torch install path...............  ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w. ......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda version torch cuda version...............  ...............11.1 
11.1nvcc version
 .....................nvcc version  11.2.....................
 deepspeed install path11.2 
...........deepspeed install path  ...........['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
...................deepspeed info  0.4.2+bc17042, bc17042, big-science...................
 deepspeed wheel compiled w.0.4.2+bc17042, bc17042, big-science 
......deepspeed wheel compiled w.  torch 1.8, cuda 11.1......
 torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
> initializing torch distributed ...
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
----------------------------------------------------------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
DeepSpeed C++/CUDA extension op report--------------------------------------------------


DeepSpeed C++/CUDA extension op report--------------------------------------------------JIT compiled ops requires ninja
--------------------------------------------------

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


DeepSpeed C++/CUDA extension op report--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------JIT compiled ops requires ninja


--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------
JIT compiled ops requires ninjaJIT compiled ops requires ninja

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
ninjaninjaninjaninja    ...................................................... ..................  [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m

[92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
----------------------------------------------------------------------------------------------------
op name

 op name op nameop name................ ................   installed................ ................installed ..  installed  ..installed..compatible   
compatiblecompatible..--------------------------------------------------


 --------------------------------------------------compatible--------------------------------------------------


--------------------------------------------------
cpu_adam ............... [92m[YES][0m ......cpu_adam cpu_adamcpu_adam [92m[OKAY][0m  ...............
..............................   [92m[YES][0m[92m[YES][0m [92m[YES][0m ...... ...... fused_adam...... [92m[OKAY][0m  
[92m[OKAY][0m.............[92m[OKAY][0m 

[93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. fused_lamb[93m[NO][0mfused_adam fused_adam  ............. .................... .............  [93m[NO][0m  [92m[OKAY][0m.......[93m[NO][0m[93m[NO][0m [92m[OKAY][0m
  .......
fused_lamb.......   [92m[OKAY][0m............. [92m[OKAY][0m
[93m[NO][0m
 .......fused_lamb  [92m[OKAY][0mfused_lambsparse_attn
.............   .........................[93m[NO][0m   [93m[NO][0m[93m[NO][0m.......   ..............[92m[OKAY][0m  
sparse_attn[92m[OKAY][0m[92m[OKAY][0m 

............ transformer[93m[NO][0m  ...................  [93m[NO][0m[92m[OKAY][0m 
....... sparse_attntransformer[92m[OKAY][0m  
............sparse_attn ............ [93m[NO][0m ............  stochastic_transformer[93m[NO][0m....... [93m[NO][0m  [92m[OKAY][0m ..............
 .  [92m[OKAY][0m[92m[OKAY][0mstochastic_transformer
[93m[NO][0m
  transformer........   [92m[OKAY][0mtransformer[93m[NO][0m............
   ...................[93m[NO][0m  [93m[NO][0m [92m[OKAY][0m .......
.......  [92m[OKAY][0m[92m[OKAY][0m

stochastic_transformerstochastic_transformer  ..  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
> setting tensorboard ...
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda version torch cuda version...............  ...............11.1 
11.1nvcc version
 nvcc version.....................  .....................11.2 
11.2deepspeed install path
 deepspeed install path...........  ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info
 deepspeed info...................  ...................0.4.2+bc17042, bc17042, big-science 
0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w.
 deepspeed wheel compiled w.......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
> initializing tensor model parallel with size 4
> initializing pipeline model parallel with size 8
> setting random seeds to 42 ...
[2021-09-24 02:11:21,021] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 2760 and data parallel seed: 42
> compiling dataset index builder ...
make: Entering directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/data'
make: Nothing to be done for 'default'.
make: Leaving directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/data'
>>> done with dataset index builder. Compilation time: 0.307 seconds
> compiling and loading fused kernels ...
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
Detected CUDA files, patching ldflags
Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/fused_kernels/build/build.ninja...
Building extension module scaled_upper_triang_masked_softmax_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module scaled_upper_triang_masked_softmax_cuda...
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
Detected CUDA files, patching ldflags
Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/fused_kernels/build/build.ninja...
Building extension module scaled_masked_softmax_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module scaled_masked_softmax_cuda...
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
Detected CUDA files, patching ldflags
Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/fused_kernels/build/build.ninja...
Building extension module fused_mix_prec_layer_norm_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module fused_mix_prec_layer_norm_cuda...
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
>>> done with compiling and loading fused kernels. Compilation time: 27.029 seconds
time to initialize megatron (seconds): 68.551
[after megatron is initialized] datetime: 2021-09-24 02:11:48 
building GPT model ...
[2021-09-24 02:11:48,760] [INFO] [utils.py:680:see_memory_usage] Before Building Model
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/cuda/memory.py:373: FutureWarning: torch.cuda.memory_cached has been renamed to torch.cuda.memory_reserved
  warnings.warn(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/cuda/memory.py:381: FutureWarning: torch.cuda.max_memory_cached has been renamed to torch.cuda.max_memory_reserved
  warnings.warn(
[2021-09-24 02:11:48,762] [INFO] [utils.py:681:see_memory_usage] MA 0.0 GB         Max_MA 0.0 GB         CA 0.0 GB         Max_CA 0 GB 
[2021-09-24 02:11:48,763] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory:  used = 37.77 GB, percent = 20.2%
SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None
Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=0, model=1): 1, ProcessCoord(pipe=0, data=0, model=2): 2, ProcessCoord(pipe=0, data=0, model=3): 3, ProcessCoord(pipe=0, data=1, model=0): 4, ProcessCoord(pipe=0, data=1, model=1): 5, ProcessCoord(pipe=0, data=1, model=2): 6, ProcessCoord(pipe=0, data=1, model=3): 7, ProcessCoord(pipe=0, data=2, model=0): 8, ProcessCoord(pipe=0, data=2, model=1): 9, ProcessCoord(pipe=0, data=2, model=2): 10, ProcessCoord(pipe=0, data=2, model=3): 11, ProcessCoord(pipe=0, data=3, model=0): 12, ProcessCoord(pipe=0, data=3, model=1): 13, ProcessCoord(pipe=0, data=3, model=2): 14, ProcessCoord(pipe=0, data=3, model=3): 15, ProcessCoord(pipe=0, data=4, model=0): 16, ProcessCoord(pipe=0, data=4, model=1): 17, ProcessCoord(pipe=0, data=4, model=2): 18, ProcessCoord(pipe=0, data=4, model=3): 19, ProcessCoord(pipe=0, data=5, model=0): 20, ProcessCoord(pipe=0, data=5, model=1): 21, ProcessCoord(pipe=0, data=5, model=2): 22, ProcessCoord(pipe=0, data=5, model=3): 23, ProcessCoord(pipe=0, data=6, model=0): 24, ProcessCoord(pipe=0, data=6, model=1): 25, ProcessCoord(pipe=0, data=6, model=2): 26, ProcessCoord(pipe=0, data=6, model=3): 27, ProcessCoord(pipe=0, data=7, model=0): 28, ProcessCoord(pipe=0, data=7, model=1): 29, ProcessCoord(pipe=0, data=7, model=2): 30, ProcessCoord(pipe=0, data=7, model=3): 31, ProcessCoord(pipe=1, data=0, model=0): 32, ProcessCoord(pipe=1, data=0, model=1): 33, ProcessCoord(pipe=1, data=0, model=2): 34, ProcessCoord(pipe=1, data=0, model=3): 35, ProcessCoord(pipe=1, data=1, model=0): 36, ProcessCoord(pipe=1, data=1, model=1): 37, ProcessCoord(pipe=1, data=1, model=2): 38, ProcessCoord(pipe=1, data=1, model=3): 39, ProcessCoord(pipe=1, data=2, model=0): 40, ProcessCoord(pipe=1, data=2, model=1): 41, ProcessCoord(pipe=1, data=2, model=2): 42, ProcessCoord(pipe=1, data=2, model=3): 43, ProcessCoord(pipe=1, data=3, model=0): 44, ProcessCoord(pipe=1, data=3, model=1): 45, ProcessCoord(pipe=1, data=3, model=2): 46, ProcessCoord(pipe=1, data=3, model=3): 47, ProcessCoord(pipe=1, data=4, model=0): 48, ProcessCoord(pipe=1, data=4, model=1): 49, ProcessCoord(pipe=1, data=4, model=2): 50, ProcessCoord(pipe=1, data=4, model=3): 51, ProcessCoord(pipe=1, data=5, model=0): 52, ProcessCoord(pipe=1, data=5, model=1): 53, ProcessCoord(pipe=1, data=5, model=2): 54, ProcessCoord(pipe=1, data=5, model=3): 55, ProcessCoord(pipe=1, data=6, model=0): 56, ProcessCoord(pipe=1, data=6, model=1): 57, ProcessCoord(pipe=1, data=6, model=2): 58, ProcessCoord(pipe=1, data=6, model=3): 59, ProcessCoord(pipe=1, data=7, model=0): 60, ProcessCoord(pipe=1, data=7, model=1): 61, ProcessCoord(pipe=1, data=7, model=2): 62, ProcessCoord(pipe=1, data=7, model=3): 63, ProcessCoord(pipe=2, data=0, model=0): 64, ProcessCoord(pipe=2, data=0, model=1): 65, ProcessCoord(pipe=2, data=0, model=2): 66, ProcessCoord(pipe=2, data=0, model=3): 67, ProcessCoord(pipe=2, data=1, model=0): 68, ProcessCoord(pipe=2, data=1, model=1): 69, ProcessCoord(pipe=2, data=1, model=2): 70, ProcessCoord(pipe=2, data=1, model=3): 71, ProcessCoord(pipe=2, data=2, model=0): 72, ProcessCoord(pipe=2, data=2, model=1): 73, ProcessCoord(pipe=2, data=2, model=2): 74, ProcessCoord(pipe=2, data=2, model=3): 75, ProcessCoord(pipe=2, data=3, model=0): 76, ProcessCoord(pipe=2, data=3, model=1): 77, ProcessCoord(pipe=2, data=3, model=2): 78, ProcessCoord(pipe=2, data=3, model=3): 79, ProcessCoord(pipe=2, data=4, model=0): 80, ProcessCoord(pipe=2, data=4, model=1): 81, ProcessCoord(pipe=2, data=4, model=2): 82, ProcessCoord(pipe=2, data=4, model=3): 83, ProcessCoord(pipe=2, data=5, model=0): 84, ProcessCoord(pipe=2, data=5, model=1): 85, ProcessCoord(pipe=2, data=5, model=2): 86, ProcessCoord(pipe=2, data=5, model=3): 87, ProcessCoord(pipe=2, data=6, model=0): 88, ProcessCoord(pipe=2, data=6, model=1): 89, ProcessCoord(pipe=2, data=6, model=2): 90, ProcessCoord(pipe=2, data=6, model=3): 91, ProcessCoord(pipe=2, data=7, model=0): 92, ProcessCoord(pipe=2, data=7, model=1): 93, ProcessCoord(pipe=2, data=7, model=2): 94, ProcessCoord(pipe=2, data=7, model=3): 95, ProcessCoord(pipe=3, data=0, model=0): 96, ProcessCoord(pipe=3, data=0, model=1): 97, ProcessCoord(pipe=3, data=0, model=2): 98, ProcessCoord(pipe=3, data=0, model=3): 99, ProcessCoord(pipe=3, data=1, model=0): 100, ProcessCoord(pipe=3, data=1, model=1): 101, ProcessCoord(pipe=3, data=1, model=2): 102, ProcessCoord(pipe=3, data=1, model=3): 103, ProcessCoord(pipe=3, data=2, model=0): 104, ProcessCoord(pipe=3, data=2, model=1): 105, ProcessCoord(pipe=3, data=2, model=2): 106, ProcessCoord(pipe=3, data=2, model=3): 107, ProcessCoord(pipe=3, data=3, model=0): 108, ProcessCoord(pipe=3, data=3, model=1): 109, ProcessCoord(pipe=3, data=3, model=2): 110, ProcessCoord(pipe=3, data=3, model=3): 111, ProcessCoord(pipe=3, data=4, model=0): 112, ProcessCoord(pipe=3, data=4, model=1): 113, ProcessCoord(pipe=3, data=4, model=2): 114, ProcessCoord(pipe=3, data=4, model=3): 115, ProcessCoord(pipe=3, data=5, model=0): 116, ProcessCoord(pipe=3, data=5, model=1): 117, ProcessCoord(pipe=3, data=5, model=2): 118, ProcessCoord(pipe=3, data=5, model=3): 119, ProcessCoord(pipe=3, data=6, model=0): 120, ProcessCoord(pipe=3, data=6, model=1): 121, ProcessCoord(pipe=3, data=6, model=2): 122, ProcessCoord(pipe=3, data=6, model=3): 123, ProcessCoord(pipe=3, data=7, model=0): 124, ProcessCoord(pipe=3, data=7, model=1): 125, ProcessCoord(pipe=3, data=7, model=2): 126, ProcessCoord(pipe=3, data=7, model=3): 127, ProcessCoord(pipe=4, data=0, model=0): 128, ProcessCoord(pipe=4, data=0, model=1): 129, ProcessCoord(pipe=4, data=0, model=2): 130, ProcessCoord(pipe=4, data=0, model=3): 131, ProcessCoord(pipe=4, data=1, model=0): 132, ProcessCoord(pipe=4, data=1, model=1): 133, ProcessCoord(pipe=4, data=1, model=2): 134, ProcessCoord(pipe=4, data=1, model=3): 135, ProcessCoord(pipe=4, data=2, model=0): 136, ProcessCoord(pipe=4, data=2, model=1): 137, ProcessCoord(pipe=4, data=2, model=2): 138, ProcessCoord(pipe=4, data=2, model=3): 139, ProcessCoord(pipe=4, data=3, model=0): 140, ProcessCoord(pipe=4, data=3, model=1): 141, ProcessCoord(pipe=4, data=3, model=2): 142, ProcessCoord(pipe=4, data=3, model=3): 143, ProcessCoord(pipe=4, data=4, model=0): 144, ProcessCoord(pipe=4, data=4, model=1): 145, ProcessCoord(pipe=4, data=4, model=2): 146, ProcessCoord(pipe=4, data=4, model=3): 147, ProcessCoord(pipe=4, data=5, model=0): 148, ProcessCoord(pipe=4, data=5, model=1): 149, ProcessCoord(pipe=4, data=5, model=2): 150, ProcessCoord(pipe=4, data=5, model=3): 151, ProcessCoord(pipe=4, data=6, model=0): 152, ProcessCoord(pipe=4, data=6, model=1): 153, ProcessCoord(pipe=4, data=6, model=2): 154, ProcessCoord(pipe=4, data=6, model=3): 155, ProcessCoord(pipe=4, data=7, model=0): 156, ProcessCoord(pipe=4, data=7, model=1): 157, ProcessCoord(pipe=4, data=7, model=2): 158, ProcessCoord(pipe=4, data=7, model=3): 159, ProcessCoord(pipe=5, data=0, model=0): 160, ProcessCoord(pipe=5, data=0, model=1): 161, ProcessCoord(pipe=5, data=0, model=2): 162, ProcessCoord(pipe=5, data=0, model=3): 163, ProcessCoord(pipe=5, data=1, model=0): 164, ProcessCoord(pipe=5, data=1, model=1): 165, ProcessCoord(pipe=5, data=1, model=2): 166, ProcessCoord(pipe=5, data=1, model=3): 167, ProcessCoord(pipe=5, data=2, model=0): 168, ProcessCoord(pipe=5, data=2, model=1): 169, ProcessCoord(pipe=5, data=2, model=2): 170, ProcessCoord(pipe=5, data=2, model=3): 171, ProcessCoord(pipe=5, data=3, model=0): 172, ProcessCoord(pipe=5, data=3, model=1): 173, ProcessCoord(pipe=5, data=3, model=2): 174, ProcessCoord(pipe=5, data=3, model=3): 175, ProcessCoord(pipe=5, data=4, model=0): 176, ProcessCoord(pipe=5, data=4, model=1): 177, ProcessCoord(pipe=5, data=4, model=2): 178, ProcessCoord(pipe=5, data=4, model=3): 179, ProcessCoord(pipe=5, data=5, model=0): 180, ProcessCoord(pipe=5, data=5, model=1): 181, ProcessCoord(pipe=5, data=5, model=2): 182, ProcessCoord(pipe=5, data=5, model=3): 183, ProcessCoord(pipe=5, data=6, model=0): 184, ProcessCoord(pipe=5, data=6, model=1): 185, ProcessCoord(pipe=5, data=6, model=2): 186, ProcessCoord(pipe=5, data=6, model=3): 187, ProcessCoord(pipe=5, data=7, model=0): 188, ProcessCoord(pipe=5, data=7, model=1): 189, ProcessCoord(pipe=5, data=7, model=2): 190, ProcessCoord(pipe=5, data=7, model=3): 191, ProcessCoord(pipe=6, data=0, model=0): 192, ProcessCoord(pipe=6, data=0, model=1): 193, ProcessCoord(pipe=6, data=0, model=2): 194, ProcessCoord(pipe=6, data=0, model=3): 195, ProcessCoord(pipe=6, data=1, model=0): 196, ProcessCoord(pipe=6, data=1, model=1): 197, ProcessCoord(pipe=6, data=1, model=2): 198, ProcessCoord(pipe=6, data=1, model=3): 199, ProcessCoord(pipe=6, data=2, model=0): 200, ProcessCoord(pipe=6, data=2, model=1): 201, ProcessCoord(pipe=6, data=2, model=2): 202, ProcessCoord(pipe=6, data=2, model=3): 203, ProcessCoord(pipe=6, data=3, model=0): 204, ProcessCoord(pipe=6, data=3, model=1): 205, ProcessCoord(pipe=6, data=3, model=2): 206, ProcessCoord(pipe=6, data=3, model=3): 207, ProcessCoord(pipe=6, data=4, model=0): 208, ProcessCoord(pipe=6, data=4, model=1): 209, ProcessCoord(pipe=6, data=4, model=2): 210, ProcessCoord(pipe=6, data=4, model=3): 211, ProcessCoord(pipe=6, data=5, model=0): 212, ProcessCoord(pipe=6, data=5, model=1): 213, ProcessCoord(pipe=6, data=5, model=2): 214, ProcessCoord(pipe=6, data=5, model=3): 215, ProcessCoord(pipe=6, data=6, model=0): 216, ProcessCoord(pipe=6, data=6, model=1): 217, ProcessCoord(pipe=6, data=6, model=2): 218, ProcessCoord(pipe=6, data=6, model=3): 219, ProcessCoord(pipe=6, data=7, model=0): 220, ProcessCoord(pipe=6, data=7, model=1): 221, ProcessCoord(pipe=6, data=7, model=2): 222, ProcessCoord(pipe=6, data=7, model=3): 223, ProcessCoord(pipe=7, data=0, model=0): 224, ProcessCoord(pipe=7, data=0, model=1): 225, ProcessCoord(pipe=7, data=0, model=2): 226, ProcessCoord(pipe=7, data=0, model=3): 227, ProcessCoord(pipe=7, data=1, model=0): 228, ProcessCoord(pipe=7, data=1, model=1): 229, ProcessCoord(pipe=7, data=1, model=2): 230, ProcessCoord(pipe=7, data=1, model=3): 231, ProcessCoord(pipe=7, data=2, model=0): 232, ProcessCoord(pipe=7, data=2, model=1): 233, ProcessCoord(pipe=7, data=2, model=2): 234, ProcessCoord(pipe=7, data=2, model=3): 235, ProcessCoord(pipe=7, data=3, model=0): 236, ProcessCoord(pipe=7, data=3, model=1): 237, ProcessCoord(pipe=7, data=3, model=2): 238, ProcessCoord(pipe=7, data=3, model=3): 239, ProcessCoord(pipe=7, data=4, model=0): 240, ProcessCoord(pipe=7, data=4, model=1): 241, ProcessCoord(pipe=7, data=4, model=2): 242, ProcessCoord(pipe=7, data=4, model=3): 243, ProcessCoord(pipe=7, data=5, model=0): 244, ProcessCoord(pipe=7, data=5, model=1): 245, ProcessCoord(pipe=7, data=5, model=2): 246, ProcessCoord(pipe=7, data=5, model=3): 247, ProcessCoord(pipe=7, data=6, model=0): 248, ProcessCoord(pipe=7, data=6, model=1): 249, ProcessCoord(pipe=7, data=6, model=2): 250, ProcessCoord(pipe=7, data=6, model=3): 251, ProcessCoord(pipe=7, data=7, model=0): 252, ProcessCoord(pipe=7, data=7, model=1): 253, ProcessCoord(pipe=7, data=7, model=2): 254, ProcessCoord(pipe=7, data=7, model=3): 255}
[2021-09-24 02:11:50,155] [INFO] [module.py:360:_partition_layers] Partitioning pipeline stages with method type:transformer
stage=0 layers=7
     0: _to_float16
     1: EmbeddingPipe
     2: <lambda>
     3: ParallelTransformerLayerPipe
     4: ParallelTransformerLayerPipe
     5: ParallelTransformerLayerPipe
     6: ParallelTransformerLayerPipe
stage=1 layers=4
     7: ParallelTransformerLayerPipe
     8: ParallelTransformerLayerPipe
     9: ParallelTransformerLayerPipe
    10: ParallelTransformerLayerPipe
stage=2 layers=4
    11: ParallelTransformerLayerPipe
    12: ParallelTransformerLayerPipe
    13: ParallelTransformerLayerPipe
    14: ParallelTransformerLayerPipe
stage=3 layers=4
    15: ParallelTransformerLayerPipe
    16: ParallelTransformerLayerPipe
    17: ParallelTransformerLayerPipe
    18: ParallelTransformerLayerPipe
stage=4 layers=4
    19: ParallelTransformerLayerPipe
    20: ParallelTransformerLayerPipe
    21: ParallelTransformerLayerPipe
    22: ParallelTransformerLayerPipe
stage=5 layers=4
    23: ParallelTransformerLayerPipe
    24: ParallelTransformerLayerPipe
    25: ParallelTransformerLayerPipe
    26: ParallelTransformerLayerPipe
stage=6 layers=4
    27: ParallelTransformerLayerPipe
    28: ParallelTransformerLayerPipe
    29: ParallelTransformerLayerPipe
    30: ParallelTransformerLayerPipe
stage=7 layers=8
    31: ParallelTransformerLayerPipe
    32: ParallelTransformerLayerPipe
    33: ParallelTransformerLayerPipe
    34: ParallelTransformerLayerPipe
    35: <lambda>
    36: MixedFusedLayerNorm
    37: EmbeddingPipe
    38: float16_to_fp32
  loss: CrossEntropy
 > number of parameters on (tensor, pipeline) model parallel rank (0, 3): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (1, 1): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (3, 1): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (0, 1): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (2, 1): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (3, 3): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (1, 3): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (2, 3): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (3, 6): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (0, 6): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (1, 6): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (2, 6): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (2, 2): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (0, 2): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (1, 2): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (3, 2): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (0, 5): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (1, 5): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (3, 5): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (2, 5): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (0, 4): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (2, 4): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (1, 4): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (3, 4): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (2, 7): 1986498560
 > number of parameters on (tensor, pipeline) model parallel rank (1, 7): 1986498560
 > number of parameters on (tensor, pipeline) model parallel rank (0, 7): 1986498560
 > number of parameters on (tensor, pipeline) model parallel rank (3, 7): 1986498560
 > number of parameters on (tensor, pipeline) model parallel rank (2, 0): 1986465792
 > number of parameters on (tensor, pipeline) model parallel rank (3, 0): 1986465792
 > number of parameters on (tensor, pipeline) model parallel rank (1, 0): 1986465792
[2021-09-24 02:11:51,439] [INFO] [utils.py:680:see_memory_usage] After Building Model
[2021-09-24 02:11:51,440] [INFO] [utils.py:681:see_memory_usage] MA 3.77 GB         Max_MA 3.79 GB         CA 3.79 GB         Max_CA 4 GB 
[2021-09-24 02:11:51,441] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory:  used = 37.96 GB, percent = 20.3%
 > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 1986465792
setting training iterations to 159576
> learning rate decay style: cosine
DeepSpeed is enabled.
[2021-09-24 02:11:51,495] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.4.2+bc17042, git-hash=bc17042, git-branch=big-science
[2021-09-24 02:11:51,606] [INFO] [engine.py:179:__init__] DeepSpeed Flops Profiler Enabled: False
[2021-09-24 02:11:51,606] [INFO] [engine.py:736:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer
[2021-09-24 02:11:51,606] [INFO] [engine.py:741:_configure_optimizer] Using client Optimizer as basic optimizer
[2021-09-24 02:11:51,606] [INFO] [engine.py:750:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam
[2021-09-24 02:11:51,607] [INFO] [utils.py:43:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type=<class 'apex.optimizers.fused_adam.FusedAdam'>
[2021-09-24 02:11:51,607] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer
[2021-09-24 02:11:51,607] [INFO] [stage2.py:106:__init__] Reduce bucket size 500000000
[2021-09-24 02:11:51,607] [INFO] [stage2.py:107:__init__] Allgather bucket size 500000000
[2021-09-24 02:11:51,607] [INFO] [stage2.py:108:__init__] CPU Offload: False
[2021-09-24 02:11:51,607] [INFO] [stage2.py:109:__init__] Round robin gradient partitioning: False
[2021-09-24 02:11:56,299] [INFO] [stage2.py:419:__init__] optimizer state initialized
[2021-09-24 02:11:56,299] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam
[2021-09-24 02:11:56,299] [INFO] [engine.py:553:_configure_lr_scheduler] DeepSpeed using client LR scheduler
[2021-09-24 02:11:56,299] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = <megatron.learning_rates.AnnealingLR object at 0x152178dfed00>
[2021-09-24 02:11:56,300] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)]
[2021-09-24 02:11:56,300] [INFO] [config.py:900:print] DeepSpeedEngine configuration:
[2021-09-24 02:11:56,300] [INFO] [config.py:904:print]   activation_checkpointing_config  {
    "partition_activations": false, 
    "contiguous_memory_optimization": false, 
    "cpu_checkpointing": false, 
    "number_checkpoints": null, 
    "synchronize_checkpoint_boundary": false, 
    "profile": false
}
[2021-09-24 02:11:56,300] [INFO] [config.py:904:print]   aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}
[2021-09-24 02:11:56,300] [INFO] [config.py:904:print]   allreduce_always_fp32 ........ False
[2021-09-24 02:11:56,300] [INFO] [config.py:904:print]   amp_enabled .................. False
[2021-09-24 02:11:56,300] [INFO] [config.py:904:print]   amp_params ................... False
[2021-09-24 02:11:56,300] [INFO] [config.py:904:print]   checkpoint_tag_validation_enabled  True
[2021-09-24 02:11:56,300] [INFO] [config.py:904:print]   checkpoint_tag_validation_fail  False
[2021-09-24 02:11:56,300] [INFO] [config.py:904:print]   disable_allgather ............ False
[2021-09-24 02:11:56,300] [INFO] [config.py:904:print]   dump_state ................... False
[2021-09-24 02:11:56,300] [INFO] [config.py:904:print]   dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1}
[2021-09-24 02:11:56,300] [INFO] [config.py:904:print]   eigenvalue_enabled ........... False
[2021-09-24 02:11:56,300] [INFO] [config.py:904:print]   eigenvalue_gas_boundary_resolution  1
[2021-09-24 02:11:56,300] [INFO] [config.py:904:print]   eigenvalue_layer_name ........ bert.encoder.layer
[2021-09-24 02:11:56,300] [INFO] [config.py:904:print]   eigenvalue_layer_num ......... 0
[2021-09-24 02:11:56,300] [INFO] [config.py:904:print]   eigenvalue_max_iter .......... 100
[2021-09-24 02:11:56,300] [INFO] [config.py:904:print]   eigenvalue_stability ......... 1e-06
[2021-09-24 02:11:56,300] [INFO] [config.py:904:print]   eigenvalue_tol ............... 0.01
[2021-09-24 02:11:56,300] [INFO] [config.py:904:print]   eigenvalue_verbose ........... False
[2021-09-24 02:11:56,300] [INFO] [config.py:904:print]   elasticity_enabled ........... False
[2021-09-24 02:11:56,301] [INFO] [config.py:904:print]   flops_profiler_config ........ {
    "enabled": false, 
    "profile_step": 1, 
    "module_depth": -1, 
    "top_modules": 1, 
    "detailed": true, 
    "output_file": null
}
[2021-09-24 02:11:56,301] [INFO] [config.py:904:print]   fp16_enabled ................. True
[2021-09-24 02:11:56,301] [INFO] [config.py:904:print]   fp16_mixed_quantize .......... False
[2021-09-24 02:11:56,301] [INFO] [config.py:904:print]   global_rank .................. 0
[2021-09-24 02:11:56,301] [INFO] [config.py:904:print]   gradient_accumulation_steps .. 256
[2021-09-24 02:11:56,301] [INFO] [config.py:904:print]   gradient_clipping ............ 1.0
[2021-09-24 02:11:56,301] [INFO] [config.py:904:print]   gradient_predivide_factor .... 1.0
[2021-09-24 02:11:56,301] [INFO] [config.py:904:print]   initial_dynamic_scale ........ 4096
[2021-09-24 02:11:56,301] [INFO] [config.py:904:print]   loss_scale ................... 0
[2021-09-24 02:11:56,301] [INFO] [config.py:904:print]   memory_breakdown ............. False
[2021-09-24 02:11:56,301] [INFO] [config.py:904:print]   optimizer_legacy_fusion ...... False
[2021-09-24 02:11:56,301] [INFO] [config.py:904:print]   optimizer_name ............... None
[2021-09-24 02:11:56,301] [INFO] [config.py:904:print]   optimizer_params ............. None
[2021-09-24 02:11:56,301] [INFO] [config.py:904:print]   pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0}
[2021-09-24 02:11:56,301] [INFO] [config.py:904:print]   pld_enabled .................. False
[2021-09-24 02:11:56,301] [INFO] [config.py:904:print]   pld_params ................... False
[2021-09-24 02:11:56,301] [INFO] [config.py:904:print]   prescale_gradients ........... False
[2021-09-24 02:11:56,301] [INFO] [config.py:904:print]   quantize_change_rate ......... 0.001
[2021-09-24 02:11:56,301] [INFO] [config.py:904:print]   quantize_groups .............. 1
[2021-09-24 02:11:56,301] [INFO] [config.py:904:print]   quantize_offset .............. 1000
[2021-09-24 02:11:56,301] [INFO] [config.py:904:print]   quantize_period .............. 1000
[2021-09-24 02:11:56,301] [INFO] [config.py:904:print]   quantize_rounding ............ 0
[2021-09-24 02:11:56,301] [INFO] [config.py:904:print]   quantize_start_bits .......... 16
[2021-09-24 02:11:56,301] [INFO] [config.py:904:print]   quantize_target_bits ......... 8
[2021-09-24 02:11:56,301] [INFO] [config.py:904:print]   quantize_training_enabled .... False
[2021-09-24 02:11:56,301] [INFO] [config.py:904:print]   quantize_type ................ 0
[2021-09-24 02:11:56,301] [INFO] [config.py:904:print]   quantize_verbose ............. False
[2021-09-24 02:11:56,301] [INFO] [config.py:904:print]   scheduler_name ............... None
[2021-09-24 02:11:56,301] [INFO] [config.py:904:print]   scheduler_params ............. None
[2021-09-24 02:11:56,301] [INFO] [config.py:904:print]   sparse_attention ............. None
[2021-09-24 02:11:56,301] [INFO] [config.py:904:print]   sparse_gradients_enabled ..... False
[2021-09-24 02:11:56,301] [INFO] [config.py:904:print]   steps_per_print .............. 2000
[2021-09-24 02:11:56,301] [INFO] [config.py:904:print]   tensorboard_enabled .......... False
[2021-09-24 02:11:56,301] [INFO] [config.py:904:print]   tensorboard_job_name ......... DeepSpeedJobName
[2021-09-24 02:11:56,301] [INFO] [config.py:904:print]   tensorboard_output_path ...... 
[2021-09-24 02:11:56,301] [INFO] [config.py:904:print]   train_batch_size ............. 2048
[2021-09-24 02:11:56,301] [INFO] [config.py:904:print]   train_micro_batch_size_per_gpu  1
[2021-09-24 02:11:56,301] [INFO] [config.py:904:print]   use_quantizer_kernel ......... False
[2021-09-24 02:11:56,302] [INFO] [config.py:904:print]   wall_clock_breakdown ......... False
[2021-09-24 02:11:56,302] [INFO] [config.py:904:print]   world_size ................... 8
[2021-09-24 02:11:56,302] [INFO] [config.py:904:print]   zero_allow_untested_optimizer  False
[2021-09-24 02:11:56,302] [INFO] [config.py:904:print]   zero_config .................. {
    "stage": 1, 
    "contiguous_gradients": false, 
    "reduce_scatter": true, 
    "reduce_bucket_size": 5.000000e+08, 
    "allgather_partitions": true, 
    "allgather_bucket_size": 5.000000e+08, 
    "overlap_comm": false, 
    "load_from_fp32_weights": true, 
    "elastic_checkpoint": true, 
    "offload_param": null, 
    "offload_optimizer": null, 
    "sub_group_size": 1.000000e+09, 
    "prefetch_bucket_size": 5.000000e+07, 
    "param_persistence_threshold": 1.000000e+05, 
    "max_live_parameters": 1.000000e+09, 
    "max_reuse_distance": 1.000000e+09, 
    "gather_fp16_weights_on_model_save": false, 
    "ignore_unused_parameters": true, 
    "round_robin_gradients": false, 
    "legacy_stage1": false
}
[2021-09-24 02:11:56,302] [INFO] [config.py:904:print]   zero_enabled ................. True
[2021-09-24 02:11:56,302] [INFO] [config.py:904:print]   zero_optimization_stage ...... 1
[2021-09-24 02:11:56,302] [INFO] [config.py:906:print]   json = {
    "train_micro_batch_size_per_gpu": 1, 
    "train_batch_size": 2.048000e+03, 
    "gradient_clipping": 1.0, 
    "zero_optimization": {
        "stage": 1
    }, 
    "fp16": {
        "enabled": true, 
        "loss_scale": 0, 
        "loss_scale_window": 500, 
        "hysteresis": 2, 
        "min_loss_scale": 1, 
        "initial_scale_power": 12
    }, 
    "steps_per_print": 2.000000e+03, 
    "wall_clock_breakdown": false
}
[2021-09-24 02:11:56,302] [INFO] [engine.py:76:__init__] CONFIG: micro_batches=256 micro_batch_size=1
[2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=0 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=3 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=1 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=2 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=131 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=128 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=129 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=194 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=193 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=192 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=195 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=64 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=66 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=65 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=67 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=32 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=33 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=35 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=130 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=97 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=96 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=99 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=98 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=224 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=225 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=226 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=227 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=160 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=163 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=161 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=162 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=34 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
WARNING: could not find the metadata file /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints 
    will not load any checkpoints and will start from random
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
time (ms) | load-checkpoint: 1.91
[after model, optimizer, and learning rate scheduler are built] datetime: 2021-09-24 02:11:56 
> building train, validation, and test datasets ...
 > datasets target sizes (minimum size):
    train:      300000000
    validation: 1638400
    test:       10240
> building train, validation, and test datasets for GPT ...
 > building dataset index ...
    reading sizes...
    reading pointers...
    reading document index...
    creating numpy buffer of mmap...
    creating memory view of numpy buffer...
 > finished creating indexed dataset in 0.214922 seconds
    number of documents: 304230423
 > dataset split:
    train:
     document indices in [0, 288714672) total of 288714672 documents
    validation:
     document indices in [288714672, 303926193) total of 15211521 documents
    test:
     document indices in [303926193, 304230423) total of 304230 documents
 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_300000000ns_2048sl_42s_doc_idx.npy
 > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_300000000ns_2048sl_42s_sample_idx.npy
 > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_300000000ns_2048sl_42s_shuffle_idx.npy
    loaded indexed file in 0.337 seconds
    total number of samples: 394611670
    total number of epochs: 3
 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_1638400ns_2048sl_42s_doc_idx.npy
 > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_1638400ns_2048sl_42s_sample_idx.npy
 > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_1638400ns_2048sl_42s_shuffle_idx.npy
    loaded indexed file in 0.309 seconds
    total number of samples: 6927161
    total number of epochs: 1
 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_42s_doc_idx.npy
 > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_42s_sample_idx.npy
 > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_42s_shuffle_idx.npy
    loaded indexed file in 0.060 seconds
    total number of samples: 137384
    total number of epochs: 1
> finished creating GPT datasets ...
[after dataloaders are built] datetime: 2021-09-24 02:12:03 
done with setup ...
training ...
time (ms) | model-and-optimizer-setup: 8062.72 | train/valid/test-data-iterators-setup: 5729.09
[before the start of training step] datetime: 2021-09-24 02:12:03 
[2021-09-24 02:12:03,365] [INFO] [checkpointing.py:408:forward] Activation Checkpointing Information
[2021-09-24 02:12:03,365] [INFO] [checkpointing.py:409:forward] ----Partition Activations False, CPU CHECKPOINTING False
[2021-09-24 02:12:03,365] [INFO] [checkpointing.py:412:forward] ----contiguous Memory Checkpointing False with 32 total layers
[2021-09-24 02:12:03,365] [INFO] [checkpointing.py:415:forward] ----Synchronization False
[2021-09-24 02:12:03,365] [INFO] [checkpointing.py:416:forward] ----Profiling time in checkpointing False
[Rank 1] (after 1 iterations) memory (MB) | allocated: 6661.611328125 | max allocated: 11742.55810546875 | reserved: 21150.0 | max reserved: 21150.0
[Rank 33] (after 1 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0
[Rank 65] (after 1 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0
[Rank 97] (after 1 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0
[Rank 225] (after 1 iterations) memory (MB) | allocated: 7107.70751953125 | max allocated: 11884.6845703125 | reserved: 22492.0 | max reserved: 22492.0
[Rank 129] (after 1 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0
[Rank 193] (after 1 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18586.0 | max reserved: 18586.0
[Rank 161] (after 1 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0
[Rank 2] (after 1 iterations) memory (MB) | allocated: 6661.611328125 | max allocated: 11742.55810546875 | reserved: 21150.0 | max reserved: 21150.0
[Rank 34] (after 1 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0
[Rank 226] (after 1 iterations) memory (MB) | allocated: 7107.70751953125 | max allocated: 11884.6845703125 | reserved: 21700.0 | max reserved: 21700.0
[Rank 66] (after 1 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18778.0 | max reserved: 18778.0
[Rank 98] (after 1 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18586.0 | max reserved: 18586.0
[Rank 130] (after 1 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0
[Rank 194] (after 1 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18650.0 | max reserved: 18650.0
[Rank 162] (after 1 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0
[Rank 0] (after 1 iterations) memory (MB) | allocated: 6661.611328125 | max allocated: 11742.55810546875 | reserved: 21470.0 | max reserved: 21470.0
[Rank 64] (after 1 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 19252.0 | max reserved: 19252.0
[Rank 32] (after 1 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18868.0 | max reserved: 18868.0
[Rank 128] (after 1 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18868.0 | max reserved: 18868.0
[Rank 96] (after 1 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18868.0 | max reserved: 18868.0
[Rank 224] (after 1 iterations) memory (MB) | allocated: 7107.70751953125 | max allocated: 11884.6845703125 | reserved: 22492.0 | max reserved: 22492.0
[Rank 192] (after 1 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18868.0 | max reserved: 18868.0
[Rank 160] (after 1 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18868.0 | max reserved: 18868.0
[Rank 35] (after 1 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0
[Rank 3] (after 1 iterations) memory (MB) | allocated: 6661.611328125 | max allocated: 11742.55810546875 | reserved: 21150.0 | max reserved: 21150.0
[Rank 67] (after 1 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18522.0 | max reserved: 18522.0
[Rank 99] (after 1 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0
[Rank 131] (after 1 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18522.0 | max reserved: 18522.0
[Rank 227] (after 1 iterations) memory (MB) | allocated: 7107.70751953125 | max allocated: 11884.6845703125 | reserved: 21700.0 | max reserved: 21700.0
[Rank 195] (after 1 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18586.0 | max reserved: 18586.0
[Rank 163] (after 1 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0
 iteration        1/  159576 | consumed samples:           16 | elapsed time per iteration (ms): 31536.2 | learning rate: 4.438E-09 | global batch size:    16 | lm loss: 1.426722E+01 | loss scale: 4096.0 | grad norm: 1863985.704 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration        2/  159576 | consumed samples:           32 | elapsed time per iteration (ms): 13049.6 | learning rate: 8.876E-09 | global batch size:    16 | lm loss: 1.429125E+01 | loss scale: 4096.0 | grad norm: 1882741.499 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration        3/  159576 | consumed samples:           48 | elapsed time per iteration (ms): 13671.4 | learning rate: 1.331E-08 | global batch size:    16 | lm loss: 1.421026E+01 | loss scale: 4096.0 | grad norm: 1871916.438 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration        4/  159576 | consumed samples:           64 | elapsed time per iteration (ms): 13544.5 | learning rate: 1.775E-08 | global batch size:    16 | lm loss: 1.424627E+01 | loss scale: 4096.0 | grad norm: 1912485.128 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration        5/  159576 | consumed samples:           80 | elapsed time per iteration (ms): 13955.0 | learning rate: 2.219E-08 | global batch size:    16 | lm loss: 1.421161E+01 | loss scale: 4096.0 | grad norm: 1873991.265 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration        6/  159576 | consumed samples:           96 | elapsed time per iteration (ms): 13725.9 | learning rate: 2.663E-08 | global batch size:    16 | lm loss: 1.423833E+01 | loss scale: 4096.0 | grad norm: 1889068.937 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration        7/  159576 | consumed samples:          112 | elapsed time per iteration (ms): 13496.8 | learning rate: 3.107E-08 | global batch size:    16 | lm loss: 1.423929E+01 | loss scale: 4096.0 | grad norm: 1864001.655 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration        8/  159576 | consumed samples:          128 | elapsed time per iteration (ms): 13565.8 | learning rate: 3.550E-08 | global batch size:    16 | lm loss: 1.424760E+01 | loss scale: 4096.0 | grad norm: 1867381.949 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration        9/  159576 | consumed samples:          144 | elapsed time per iteration (ms): 14076.3 | learning rate: 3.994E-08 | global batch size:    16 | lm loss: 1.418199E+01 | loss scale: 4096.0 | grad norm: 1902029.931 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       10/  159576 | consumed samples:          160 | elapsed time per iteration (ms): 13497.5 | learning rate: 4.438E-08 | global batch size:    16 | lm loss: 1.412427E+01 | loss scale: 4096.0 | grad norm: 1865649.234 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       11/  159576 | consumed samples:          176 | elapsed time per iteration (ms): 13459.5 | learning rate: 4.882E-08 | global batch size:    16 | lm loss: 1.407386E+01 | loss scale: 4096.0 | grad norm: 1861067.628 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       12/  159576 | consumed samples:          192 | elapsed time per iteration (ms): 13581.0 | learning rate: 5.325E-08 | global batch size:    16 | lm loss: 1.400436E+01 | loss scale: 4096.0 | grad norm: 1857208.659 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       13/  159576 | consumed samples:          208 | elapsed time per iteration (ms): 13877.0 | learning rate: 5.769E-08 | global batch size:    16 | lm loss: 1.374212E+01 | loss scale: 4096.0 | grad norm: 1860712.228 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       14/  159576 | consumed samples:          224 | elapsed time per iteration (ms): 13730.6 | learning rate: 6.213E-08 | global batch size:    16 | lm loss: 1.363158E+01 | loss scale: 4096.0 | grad norm: 1835837.890 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       15/  159576 | consumed samples:          240 | elapsed time per iteration (ms): 13589.9 | learning rate: 6.657E-08 | global batch size:    16 | lm loss: 1.353429E+01 | loss scale: 4096.0 | grad norm: 1866742.342 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       16/  159576 | consumed samples:          256 | elapsed time per iteration (ms): 13709.9 | learning rate: 7.101E-08 | global batch size:    16 | lm loss: 1.346230E+01 | loss scale: 4096.0 | grad norm: 1867848.322 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       17/  159576 | consumed samples:          272 | elapsed time per iteration (ms): 13515.8 | learning rate: 7.544E-08 | global batch size:    16 | lm loss: 1.257517E+01 | loss scale: 4096.0 | grad norm: 1827444.965 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       18/  159576 | consumed samples:          288 | elapsed time per iteration (ms): 13800.0 | learning rate: 7.988E-08 | global batch size:    16 | lm loss: 1.251998E+01 | loss scale: 4096.0 | grad norm: 2020558.797 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       19/  159576 | consumed samples:          304 | elapsed time per iteration (ms): 13516.3 | learning rate: 8.432E-08 | global batch size:    16 | lm loss: 1.265157E+01 | loss scale: 4096.0 | grad norm: 2257407.748 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       20/  159576 | consumed samples:          320 | elapsed time per iteration (ms): 13549.6 | learning rate: 8.876E-08 | global batch size:    16 | lm loss: 1.252521E+01 | loss scale: 4096.0 | grad norm: 2095375.557 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       21/  159576 | consumed samples:          336 | elapsed time per iteration (ms): 13586.7 | learning rate: 9.320E-08 | global batch size:    16 | lm loss: 1.244903E+01 | loss scale: 4096.0 | grad norm: 2211855.540 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       22/  159576 | consumed samples:          352 | elapsed time per iteration (ms): 14140.0 | learning rate: 9.763E-08 | global batch size:    16 | lm loss: 1.221426E+01 | loss scale: 4096.0 | grad norm: 2152853.946 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       23/  159576 | consumed samples:          368 | elapsed time per iteration (ms): 13565.7 | learning rate: 1.021E-07 | global batch size:    16 | lm loss: 1.223387E+01 | loss scale: 4096.0 | grad norm: 2257726.245 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       24/  159576 | consumed samples:          384 | elapsed time per iteration (ms): 13529.2 | learning rate: 1.065E-07 | global batch size:    16 | lm loss: 1.252795E+01 | loss scale: 4096.0 | grad norm: 2648402.060 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       25/  159576 | consumed samples:          400 | elapsed time per iteration (ms): 13468.4 | learning rate: 1.109E-07 | global batch size:    16 | lm loss: 1.249682E+01 | loss scale: 4096.0 | grad norm: 2816711.826 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       26/  159576 | consumed samples:          416 | elapsed time per iteration (ms): 13529.9 | learning rate: 1.154E-07 | global batch size:    16 | lm loss: 1.219784E+01 | loss scale: 4096.0 | grad norm: 2380750.659 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       27/  159576 | consumed samples:          432 | elapsed time per iteration (ms): 13833.4 | learning rate: 1.198E-07 | global batch size:    16 | lm loss: 1.182601E+01 | loss scale: 4096.0 | grad norm: 2116005.650 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       28/  159576 | consumed samples:          448 | elapsed time per iteration (ms): 13615.6 | learning rate: 1.243E-07 | global batch size:    16 | lm loss: 1.159655E+01 | loss scale: 4096.0 | grad norm: 1805209.516 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       29/  159576 | consumed samples:          464 | elapsed time per iteration (ms): 13371.2 | learning rate: 1.287E-07 | global batch size:    16 | lm loss: 1.165552E+01 | loss scale: 4096.0 | grad norm: 1731569.615 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       30/  159576 | consumed samples:          480 | elapsed time per iteration (ms): 13604.8 | learning rate: 1.331E-07 | global batch size:    16 | lm loss: 1.154380E+01 | loss scale: 4096.0 | grad norm: 1706578.844 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       31/  159576 | consumed samples:          496 | elapsed time per iteration (ms): 13982.3 | learning rate: 1.376E-07 | global batch size:    16 | lm loss: 1.139362E+01 | loss scale: 4096.0 | grad norm: 1757980.169 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       32/  159576 | consumed samples:          512 | elapsed time per iteration (ms): 13306.0 | learning rate: 1.420E-07 | global batch size:    16 | lm loss: 1.148209E+01 | loss scale: 4096.0 | grad norm: 1697993.336 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       33/  159576 | consumed samples:          528 | elapsed time per iteration (ms): 13575.8 | learning rate: 1.464E-07 | global batch size:    16 | lm loss: 1.140995E+01 | loss scale: 4096.0 | grad norm: 1670562.081 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       34/  159576 | consumed samples:          544 | elapsed time per iteration (ms): 13613.2 | learning rate: 1.509E-07 | global batch size:    16 | lm loss: 1.132776E+01 | loss scale: 4096.0 | grad norm: 1643305.715 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       35/  159576 | consumed samples:          560 | elapsed time per iteration (ms): 13869.9 | learning rate: 1.553E-07 | global batch size:    16 | lm loss: 1.136237E+01 | loss scale: 4096.0 | grad norm: 1648846.360 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       36/  159576 | consumed samples:          576 | elapsed time per iteration (ms): 13789.0 | learning rate: 1.598E-07 | global batch size:    16 | lm loss: 1.143323E+01 | loss scale: 4096.0 | grad norm: 1598861.192 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       37/  159576 | consumed samples:          592 | elapsed time per iteration (ms): 13658.0 | learning rate: 1.642E-07 | global batch size:    16 | lm loss: 1.115875E+01 | loss scale: 4096.0 | grad norm: 1562919.350 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       38/  159576 | consumed samples:          608 | elapsed time per iteration (ms): 13961.2 | learning rate: 1.686E-07 | global batch size:    16 | lm loss: 1.117768E+01 | loss scale: 4096.0 | grad norm: 1565543.705 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       39/  159576 | consumed samples:          624 | elapsed time per iteration (ms): 13410.4 | learning rate: 1.731E-07 | global batch size:    16 | lm loss: 1.111340E+01 | loss scale: 4096.0 | grad norm: 1536768.356 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       40/  159576 | consumed samples:          640 | elapsed time per iteration (ms): 13891.8 | learning rate: 1.775E-07 | global batch size:    16 | lm loss: 1.106657E+01 | loss scale: 4096.0 | grad norm: 1548421.837 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       41/  159576 | consumed samples:          656 | elapsed time per iteration (ms): 13633.3 | learning rate: 1.820E-07 | global batch size:    16 | lm loss: 1.094995E+01 | loss scale: 4096.0 | grad norm: 1532446.839 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       42/  159576 | consumed samples:          672 | elapsed time per iteration (ms): 13643.8 | learning rate: 1.864E-07 | global batch size:    16 | lm loss: 1.087856E+01 | loss scale: 4096.0 | grad norm: 1531337.842 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       43/  159576 | consumed samples:          688 | elapsed time per iteration (ms): 13630.7 | learning rate: 1.908E-07 | global batch size:    16 | lm loss: 1.084412E+01 | loss scale: 4096.0 | grad norm: 1473539.326 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       44/  159576 | consumed samples:          704 | elapsed time per iteration (ms): 14118.0 | learning rate: 1.953E-07 | global batch size:    16 | lm loss: 1.114596E+01 | loss scale: 4096.0 | grad norm: 1496700.678 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       45/  159576 | consumed samples:          720 | elapsed time per iteration (ms): 13853.8 | learning rate: 1.997E-07 | global batch size:    16 | lm loss: 1.092829E+01 | loss scale: 4096.0 | grad norm: 1454980.052 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       46/  159576 | consumed samples:          736 | elapsed time per iteration (ms): 13549.0 | learning rate: 2.041E-07 | global batch size:    16 | lm loss: 1.074461E+01 | loss scale: 4096.0 | grad norm: 1397083.505 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       47/  159576 | consumed samples:          752 | elapsed time per iteration (ms): 13627.3 | learning rate: 2.086E-07 | global batch size:    16 | lm loss: 1.066580E+01 | loss scale: 4096.0 | grad norm: 1311670.870 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       48/  159576 | consumed samples:          768 | elapsed time per iteration (ms): 13674.9 | learning rate: 2.130E-07 | global batch size:    16 | lm loss: 1.055744E+01 | loss scale: 4096.0 | grad norm: 1292299.744 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       49/  159576 | consumed samples:          784 | elapsed time per iteration (ms): 13932.1 | learning rate: 2.175E-07 | global batch size:    16 | lm loss: 1.060610E+01 | loss scale: 4096.0 | grad norm: 1283482.631 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       50/  159576 | consumed samples:          800 | elapsed time per iteration (ms): 13665.9 | learning rate: 2.219E-07 | global batch size:    16 | lm loss: 1.063007E+01 | loss scale: 4096.0 | grad norm: 1228203.240 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       51/  159576 | consumed samples:          816 | elapsed time per iteration (ms): 13667.5 | learning rate: 2.263E-07 | global batch size:    16 | lm loss: 1.046357E+01 | loss scale: 4096.0 | grad norm: 1219490.568 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       52/  159576 | consumed samples:          832 | elapsed time per iteration (ms): 13793.6 | learning rate: 2.308E-07 | global batch size:    16 | lm loss: 1.061804E+01 | loss scale: 4096.0 | grad norm: 1197068.783 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       53/  159576 | consumed samples:          848 | elapsed time per iteration (ms): 14209.6 | learning rate: 2.352E-07 | global batch size:    16 | lm loss: 1.041930E+01 | loss scale: 4096.0 | grad norm: 1168890.772 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       54/  159576 | consumed samples:          864 | elapsed time per iteration (ms): 13453.2 | learning rate: 2.396E-07 | global batch size:    16 | lm loss: 1.035855E+01 | loss scale: 4096.0 | grad norm: 1126594.517 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       55/  159576 | consumed samples:          880 | elapsed time per iteration (ms): 13666.6 | learning rate: 2.441E-07 | global batch size:    16 | lm loss: 1.051081E+01 | loss scale: 4096.0 | grad norm: 1080949.187 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       56/  159576 | consumed samples:          896 | elapsed time per iteration (ms): 13689.5 | learning rate: 2.485E-07 | global batch size:    16 | lm loss: 1.048364E+01 | loss scale: 4096.0 | grad norm: 1069119.479 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       57/  159576 | consumed samples:          912 | elapsed time per iteration (ms): 14289.6 | learning rate: 2.530E-07 | global batch size:    16 | lm loss: 1.048154E+01 | loss scale: 4096.0 | grad norm: 1016407.938 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       58/  159576 | consumed samples:          928 | elapsed time per iteration (ms): 13663.2 | learning rate: 2.574E-07 | global batch size:    16 | lm loss: 1.019213E+01 | loss scale: 4096.0 | grad norm: 982402.590 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       59/  159576 | consumed samples:          944 | elapsed time per iteration (ms): 13704.5 | learning rate: 2.618E-07 | global batch size:    16 | lm loss: 1.019982E+01 | loss scale: 4096.0 | grad norm: 965254.453 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       60/  159576 | consumed samples:          960 | elapsed time per iteration (ms): 13846.3 | learning rate: 2.663E-07 | global batch size:    16 | lm loss: 1.021626E+01 | loss scale: 4096.0 | grad norm: 926021.764 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       61/  159576 | consumed samples:          976 | elapsed time per iteration (ms): 13469.9 | learning rate: 2.707E-07 | global batch size:    16 | lm loss: 1.008368E+01 | loss scale: 4096.0 | grad norm: 911608.476 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       62/  159576 | consumed samples:          992 | elapsed time per iteration (ms): 13774.9 | learning rate: 2.751E-07 | global batch size:    16 | lm loss: 9.892099E+00 | loss scale: 4096.0 | grad norm: 882114.442 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       63/  159576 | consumed samples:         1008 | elapsed time per iteration (ms): 13514.1 | learning rate: 2.796E-07 | global batch size:    16 | lm loss: 9.876393E+00 | loss scale: 4096.0 | grad norm: 834416.962 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       64/  159576 | consumed samples:         1024 | elapsed time per iteration (ms): 13538.5 | learning rate: 2.840E-07 | global batch size:    16 | lm loss: 9.927294E+00 | loss scale: 4096.0 | grad norm: 814691.882 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       65/  159576 | consumed samples:         1040 | elapsed time per iteration (ms): 13496.5 | learning rate: 2.885E-07 | global batch size:    16 | lm loss: 1.024293E+01 | loss scale: 4096.0 | grad norm: 821175.330 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       66/  159576 | consumed samples:         1056 | elapsed time per iteration (ms): 14030.7 | learning rate: 2.929E-07 | global batch size:    16 | lm loss: 9.930872E+00 | loss scale: 4096.0 | grad norm: 759629.854 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       67/  159576 | consumed samples:         1072 | elapsed time per iteration (ms): 13743.1 | learning rate: 2.973E-07 | global batch size:    16 | lm loss: 9.852800E+00 | loss scale: 4096.0 | grad norm: 734440.980 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       68/  159576 | consumed samples:         1088 | elapsed time per iteration (ms): 13293.2 | learning rate: 3.018E-07 | global batch size:    16 | lm loss: 9.786448E+00 | loss scale: 4096.0 | grad norm: 702591.247 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       69/  159576 | consumed samples:         1104 | elapsed time per iteration (ms): 13515.6 | learning rate: 3.062E-07 | global batch size:    16 | lm loss: 9.917148E+00 | loss scale: 4096.0 | grad norm: 689937.545 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       70/  159576 | consumed samples:         1120 | elapsed time per iteration (ms): 13786.0 | learning rate: 3.107E-07 | global batch size:    16 | lm loss: 9.593161E+00 | loss scale: 4096.0 | grad norm: 634541.803 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       71/  159576 | consumed samples:         1136 | elapsed time per iteration (ms): 13761.6 | learning rate: 3.151E-07 | global batch size:    16 | lm loss: 9.685747E+00 | loss scale: 4096.0 | grad norm: 620089.160 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       72/  159576 | consumed samples:         1152 | elapsed time per iteration (ms): 13503.1 | learning rate: 3.195E-07 | global batch size:    16 | lm loss: 9.550736E+00 | loss scale: 4096.0 | grad norm: 592735.898 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       73/  159576 | consumed samples:         1168 | elapsed time per iteration (ms): 13574.6 | learning rate: 3.240E-07 | global batch size:    16 | lm loss: 9.780053E+00 | loss scale: 4096.0 | grad norm: 578902.468 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       74/  159576 | consumed samples:         1184 | elapsed time per iteration (ms): 13563.6 | learning rate: 3.284E-07 | global batch size:    16 | lm loss: 9.660094E+00 | loss scale: 4096.0 | grad norm: 549632.302 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       75/  159576 | consumed samples:         1200 | elapsed time per iteration (ms): 13751.3 | learning rate: 3.328E-07 | global batch size:    16 | lm loss: 9.715110E+00 | loss scale: 4096.0 | grad norm: 523457.012 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       76/  159576 | consumed samples:         1216 | elapsed time per iteration (ms): 13613.9 | learning rate: 3.373E-07 | global batch size:    16 | lm loss: 9.548697E+00 | loss scale: 4096.0 | grad norm: 559789.568 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       77/  159576 | consumed samples:         1232 | elapsed time per iteration (ms): 13668.9 | learning rate: 3.417E-07 | global batch size:    16 | lm loss: 9.395579E+00 | loss scale: 4096.0 | grad norm: 516053.141 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       78/  159576 | consumed samples:         1248 | elapsed time per iteration (ms): 13540.8 | learning rate: 3.462E-07 | global batch size:    16 | lm loss: 9.450207E+00 | loss scale: 4096.0 | grad norm: 491518.990 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       79/  159576 | consumed samples:         1264 | elapsed time per iteration (ms): 13951.5 | learning rate: 3.506E-07 | global batch size:    16 | lm loss: 9.312221E+00 | loss scale: 4096.0 | grad norm: 445025.682 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       80/  159576 | consumed samples:         1280 | elapsed time per iteration (ms): 13710.1 | learning rate: 3.550E-07 | global batch size:    16 | lm loss: 9.362122E+00 | loss scale: 4096.0 | grad norm: 498046.459 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       81/  159576 | consumed samples:         1296 | elapsed time per iteration (ms): 13653.8 | learning rate: 3.595E-07 | global batch size:    16 | lm loss: 9.684261E+00 | loss scale: 4096.0 | grad norm: 460137.704 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       82/  159576 | consumed samples:         1312 | elapsed time per iteration (ms): 13416.1 | learning rate: 3.639E-07 | global batch size:    16 | lm loss: 9.111031E+00 | loss scale: 4096.0 | grad norm: 462196.098 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       83/  159576 | consumed samples:         1328 | elapsed time per iteration (ms): 13589.7 | learning rate: 3.683E-07 | global batch size:    16 | lm loss: 9.424231E+00 | loss scale: 4096.0 | grad norm: 387492.278 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       84/  159576 | consumed samples:         1344 | elapsed time per iteration (ms): 13890.8 | learning rate: 3.728E-07 | global batch size:    16 | lm loss: 9.225885E+00 | loss scale: 4096.0 | grad norm: 477146.862 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       85/  159576 | consumed samples:         1360 | elapsed time per iteration (ms): 13578.1 | learning rate: 3.772E-07 | global batch size:    16 | lm loss: 9.449253E+00 | loss scale: 4096.0 | grad norm: 498838.088 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       86/  159576 | consumed samples:         1376 | elapsed time per iteration (ms): 13600.8 | learning rate: 3.817E-07 | global batch size:    16 | lm loss: 9.186915E+00 | loss scale: 4096.0 | grad norm: 359821.133 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       87/  159576 | consumed samples:         1392 | elapsed time per iteration (ms): 13578.0 | learning rate: 3.861E-07 | global batch size:    16 | lm loss: 9.169426E+00 | loss scale: 4096.0 | grad norm: 336361.334 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       88/  159576 | consumed samples:         1408 | elapsed time per iteration (ms): 14258.1 | learning rate: 3.905E-07 | global batch size:    16 | lm loss: 9.174639E+00 | loss scale: 4096.0 | grad norm: 513262.304 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       89/  159576 | consumed samples:         1424 | elapsed time per iteration (ms): 13350.5 | learning rate: 3.950E-07 | global batch size:    16 | lm loss: 9.322023E+00 | loss scale: 4096.0 | grad norm: 417913.413 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       90/  159576 | consumed samples:         1440 | elapsed time per iteration (ms): 13582.0 | learning rate: 3.994E-07 | global batch size:    16 | lm loss: 9.319530E+00 | loss scale: 4096.0 | grad norm: 326159.953 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       91/  159576 | consumed samples:         1456 | elapsed time per iteration (ms): 13577.6 | learning rate: 4.038E-07 | global batch size:    16 | lm loss: 9.305362E+00 | loss scale: 4096.0 | grad norm: 312504.506 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       92/  159576 | consumed samples:         1472 | elapsed time per iteration (ms): 13979.9 | learning rate: 4.083E-07 | global batch size:    16 | lm loss: 8.797226E+00 | loss scale: 4096.0 | grad norm: 299274.584 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       93/  159576 | consumed samples:         1488 | elapsed time per iteration (ms): 13685.6 | learning rate: 4.127E-07 | global batch size:    16 | lm loss: 9.470177E+00 | loss scale: 4096.0 | grad norm: 889931.672 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       94/  159576 | consumed samples:         1504 | elapsed time per iteration (ms): 13625.1 | learning rate: 4.172E-07 | global batch size:    16 | lm loss: 9.601658E+00 | loss scale: 4096.0 | grad norm: 858157.270 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       95/  159576 | consumed samples:         1520 | elapsed time per iteration (ms): 13713.7 | learning rate: 4.216E-07 | global batch size:    16 | lm loss: 9.093191E+00 | loss scale: 4096.0 | grad norm: 308888.782 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       96/  159576 | consumed samples:         1536 | elapsed time per iteration (ms): 13441.7 | learning rate: 4.260E-07 | global batch size:    16 | lm loss: 9.258781E+00 | loss scale: 4096.0 | grad norm: 285375.841 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       97/  159576 | consumed samples:         1552 | elapsed time per iteration (ms): 13952.1 | learning rate: 4.305E-07 | global batch size:    16 | lm loss: 9.267257E+00 | loss scale: 4096.0 | grad norm: 266598.437 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       98/  159576 | consumed samples:         1568 | elapsed time per iteration (ms): 13570.4 | learning rate: 4.349E-07 | global batch size:    16 | lm loss: 9.302748E+00 | loss scale: 4096.0 | grad norm: 430050.353 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration       99/  159576 | consumed samples:         1584 | elapsed time per iteration (ms): 13655.7 | learning rate: 4.393E-07 | global batch size:    16 | lm loss: 9.206352E+00 | loss scale: 4096.0 | grad norm: 522965.120 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      100/  159576 | consumed samples:         1600 | elapsed time per iteration (ms): 13606.3 | learning rate: 4.438E-07 | global batch size:    16 | lm loss: 9.212991E+00 | loss scale: 4096.0 | grad norm: 351294.826 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      101/  159576 | consumed samples:         1616 | elapsed time per iteration (ms): 14021.3 | learning rate: 4.482E-07 | global batch size:    16 | lm loss: 9.392309E+00 | loss scale: 4096.0 | grad norm: 249407.405 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      102/  159576 | consumed samples:         1632 | elapsed time per iteration (ms): 13722.5 | learning rate: 4.527E-07 | global batch size:    16 | lm loss: 9.173745E+00 | loss scale: 4096.0 | grad norm: 230190.700 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      103/  159576 | consumed samples:         1648 | elapsed time per iteration (ms): 13481.3 | learning rate: 4.571E-07 | global batch size:    16 | lm loss: 9.060183E+00 | loss scale: 4096.0 | grad norm: 535519.642 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      104/  159576 | consumed samples:         1664 | elapsed time per iteration (ms): 13573.2 | learning rate: 4.615E-07 | global batch size:    16 | lm loss: 8.820353E+00 | loss scale: 4096.0 | grad norm: 252106.297 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      105/  159576 | consumed samples:         1680 | elapsed time per iteration (ms): 13679.8 | learning rate: 4.660E-07 | global batch size:    16 | lm loss: 8.907228E+00 | loss scale: 4096.0 | grad norm: 227304.496 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      106/  159576 | consumed samples:         1696 | elapsed time per iteration (ms): 13833.6 | learning rate: 4.704E-07 | global batch size:    16 | lm loss: 8.920894E+00 | loss scale: 4096.0 | grad norm: 226622.044 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      107/  159576 | consumed samples:         1712 | elapsed time per iteration (ms): 13577.9 | learning rate: 4.749E-07 | global batch size:    16 | lm loss: 8.839094E+00 | loss scale: 4096.0 | grad norm: 188033.687 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      108/  159576 | consumed samples:         1728 | elapsed time per iteration (ms): 13620.7 | learning rate: 4.793E-07 | global batch size:    16 | lm loss: 9.072345E+00 | loss scale: 4096.0 | grad norm: 405511.072 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      109/  159576 | consumed samples:         1744 | elapsed time per iteration (ms): 13608.5 | learning rate: 4.837E-07 | global batch size:    16 | lm loss: 8.981932E+00 | loss scale: 4096.0 | grad norm: 326365.949 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      110/  159576 | consumed samples:         1760 | elapsed time per iteration (ms): 13945.7 | learning rate: 4.882E-07 | global batch size:    16 | lm loss: 8.900158E+00 | loss scale: 4096.0 | grad norm: 183771.399 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      111/  159576 | consumed samples:         1776 | elapsed time per iteration (ms): 13542.6 | learning rate: 4.926E-07 | global batch size:    16 | lm loss: 8.908926E+00 | loss scale: 4096.0 | grad norm: 189581.109 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      112/  159576 | consumed samples:         1792 | elapsed time per iteration (ms): 13715.6 | learning rate: 4.970E-07 | global batch size:    16 | lm loss: 8.738115E+00 | loss scale: 4096.0 | grad norm: 176974.824 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      113/  159576 | consumed samples:         1808 | elapsed time per iteration (ms): 13456.9 | learning rate: 5.015E-07 | global batch size:    16 | lm loss: 9.185429E+00 | loss scale: 4096.0 | grad norm: 452577.591 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      114/  159576 | consumed samples:         1824 | elapsed time per iteration (ms): 14039.5 | learning rate: 5.059E-07 | global batch size:    16 | lm loss: 9.235853E+00 | loss scale: 4096.0 | grad norm: 567475.961 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      115/  159576 | consumed samples:         1840 | elapsed time per iteration (ms): 13568.6 | learning rate: 5.104E-07 | global batch size:    16 | lm loss: 8.848898E+00 | loss scale: 4096.0 | grad norm: 182062.035 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      116/  159576 | consumed samples:         1856 | elapsed time per iteration (ms): 13607.1 | learning rate: 5.148E-07 | global batch size:    16 | lm loss: 8.955499E+00 | loss scale: 4096.0 | grad norm: 179172.056 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      117/  159576 | consumed samples:         1872 | elapsed time per iteration (ms): 13798.7 | learning rate: 5.192E-07 | global batch size:    16 | lm loss: 8.835221E+00 | loss scale: 4096.0 | grad norm: 168846.925 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      118/  159576 | consumed samples:         1888 | elapsed time per iteration (ms): 13424.3 | learning rate: 5.237E-07 | global batch size:    16 | lm loss: 9.120043E+00 | loss scale: 4096.0 | grad norm: 304218.818 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      119/  159576 | consumed samples:         1904 | elapsed time per iteration (ms): 13992.7 | learning rate: 5.281E-07 | global batch size:    16 | lm loss: 8.877877E+00 | loss scale: 4096.0 | grad norm: 328004.326 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      120/  159576 | consumed samples:         1920 | elapsed time per iteration (ms): 13739.9 | learning rate: 5.325E-07 | global batch size:    16 | lm loss: 9.091492E+00 | loss scale: 4096.0 | grad norm: 542667.397 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      121/  159576 | consumed samples:         1936 | elapsed time per iteration (ms): 13438.9 | learning rate: 5.370E-07 | global batch size:    16 | lm loss: 8.963889E+00 | loss scale: 4096.0 | grad norm: 173633.066 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      122/  159576 | consumed samples:         1952 | elapsed time per iteration (ms): 13659.9 | learning rate: 5.414E-07 | global batch size:    16 | lm loss: 8.973601E+00 | loss scale: 4096.0 | grad norm: 154883.483 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      123/  159576 | consumed samples:         1968 | elapsed time per iteration (ms): 14034.9 | learning rate: 5.459E-07 | global batch size:    16 | lm loss: 8.932154E+00 | loss scale: 4096.0 | grad norm: 191305.172 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      124/  159576 | consumed samples:         1984 | elapsed time per iteration (ms): 13642.6 | learning rate: 5.503E-07 | global batch size:    16 | lm loss: 8.718765E+00 | loss scale: 4096.0 | grad norm: 141927.967 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      125/  159576 | consumed samples:         2000 | elapsed time per iteration (ms): 13607.3 | learning rate: 5.547E-07 | global batch size:    16 | lm loss: 9.022717E+00 | loss scale: 4096.0 | grad norm: 530230.902 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      126/  159576 | consumed samples:         2016 | elapsed time per iteration (ms): 13623.2 | learning rate: 5.592E-07 | global batch size:    16 | lm loss: 9.160154E+00 | loss scale: 4096.0 | grad norm: 525377.320 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      127/  159576 | consumed samples:         2032 | elapsed time per iteration (ms): 13944.5 | learning rate: 5.636E-07 | global batch size:    16 | lm loss: 8.602621E+00 | loss scale: 4096.0 | grad norm: 180832.122 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      128/  159576 | consumed samples:         2048 | elapsed time per iteration (ms): 13652.1 | learning rate: 5.680E-07 | global batch size:    16 | lm loss: 8.848473E+00 | loss scale: 4096.0 | grad norm: 159006.909 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      129/  159576 | consumed samples:         2064 | elapsed time per iteration (ms): 13619.4 | learning rate: 5.725E-07 | global batch size:    16 | lm loss: 8.697285E+00 | loss scale: 4096.0 | grad norm: 166208.955 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      130/  159576 | consumed samples:         2080 | elapsed time per iteration (ms): 13649.8 | learning rate: 5.769E-07 | global batch size:    16 | lm loss: 8.738346E+00 | loss scale: 4096.0 | grad norm: 142582.672 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      131/  159576 | consumed samples:         2096 | elapsed time per iteration (ms): 13648.8 | learning rate: 5.814E-07 | global batch size:    16 | lm loss: 8.628532E+00 | loss scale: 4096.0 | grad norm: 119745.012 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      132/  159576 | consumed samples:         2112 | elapsed time per iteration (ms): 13855.7 | learning rate: 5.858E-07 | global batch size:    16 | lm loss: 8.681314E+00 | loss scale: 4096.0 | grad norm: 238581.530 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      133/  159576 | consumed samples:         2128 | elapsed time per iteration (ms): 13614.3 | learning rate: 5.902E-07 | global batch size:    16 | lm loss: 8.853155E+00 | loss scale: 4096.0 | grad norm: 190597.797 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      134/  159576 | consumed samples:         2144 | elapsed time per iteration (ms): 13742.8 | learning rate: 5.947E-07 | global batch size:    16 | lm loss: 8.840850E+00 | loss scale: 4096.0 | grad norm: 157001.058 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      135/  159576 | consumed samples:         2160 | elapsed time per iteration (ms): 13481.4 | learning rate: 5.991E-07 | global batch size:    16 | lm loss: 8.721090E+00 | loss scale: 4096.0 | grad norm: 120761.062 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      136/  159576 | consumed samples:         2176 | elapsed time per iteration (ms): 14037.0 | learning rate: 6.036E-07 | global batch size:    16 | lm loss: 8.786610E+00 | loss scale: 4096.0 | grad norm: 109166.988 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      137/  159576 | consumed samples:         2192 | elapsed time per iteration (ms): 13631.2 | learning rate: 6.080E-07 | global batch size:    16 | lm loss: 8.825349E+00 | loss scale: 4096.0 | grad norm: 393039.207 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      138/  159576 | consumed samples:         2208 | elapsed time per iteration (ms): 13698.2 | learning rate: 6.124E-07 | global batch size:    16 | lm loss: 8.681873E+00 | loss scale: 4096.0 | grad norm: 210924.024 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      139/  159576 | consumed samples:         2224 | elapsed time per iteration (ms): 13641.8 | learning rate: 6.169E-07 | global batch size:    16 | lm loss: 8.758416E+00 | loss scale: 4096.0 | grad norm: 111138.195 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      140/  159576 | consumed samples:         2240 | elapsed time per iteration (ms): 13650.3 | learning rate: 6.213E-07 | global batch size:    16 | lm loss: 8.646829E+00 | loss scale: 4096.0 | grad norm: 115663.463 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      141/  159576 | consumed samples:         2256 | elapsed time per iteration (ms): 14097.3 | learning rate: 6.257E-07 | global batch size:    16 | lm loss: 8.653087E+00 | loss scale: 4096.0 | grad norm: 142126.653 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      142/  159576 | consumed samples:         2272 | elapsed time per iteration (ms): 13468.2 | learning rate: 6.302E-07 | global batch size:    16 | lm loss: 8.647311E+00 | loss scale: 4096.0 | grad norm: 163914.852 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      143/  159576 | consumed samples:         2288 | elapsed time per iteration (ms): 13544.7 | learning rate: 6.346E-07 | global batch size:    16 | lm loss: 8.564240E+00 | loss scale: 4096.0 | grad norm: 159952.939 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      144/  159576 | consumed samples:         2304 | elapsed time per iteration (ms): 13642.1 | learning rate: 6.391E-07 | global batch size:    16 | lm loss: 8.789017E+00 | loss scale: 4096.0 | grad norm: 169255.588 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      145/  159576 | consumed samples:         2320 | elapsed time per iteration (ms): 14181.4 | learning rate: 6.435E-07 | global batch size:    16 | lm loss: 8.811962E+00 | loss scale: 4096.0 | grad norm: 127162.884 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      146/  159576 | consumed samples:         2336 | elapsed time per iteration (ms): 13492.3 | learning rate: 6.479E-07 | global batch size:    16 | lm loss: 8.774818E+00 | loss scale: 4096.0 | grad norm: 110483.274 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      147/  159576 | consumed samples:         2352 | elapsed time per iteration (ms): 13671.3 | learning rate: 6.524E-07 | global batch size:    16 | lm loss: 8.753700E+00 | loss scale: 4096.0 | grad norm: 128181.260 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      148/  159576 | consumed samples:         2368 | elapsed time per iteration (ms): 13675.0 | learning rate: 6.568E-07 | global batch size:    16 | lm loss: 8.742964E+00 | loss scale: 4096.0 | grad norm: 140698.611 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      149/  159576 | consumed samples:         2384 | elapsed time per iteration (ms): 14154.8 | learning rate: 6.612E-07 | global batch size:    16 | lm loss: 8.705631E+00 | loss scale: 4096.0 | grad norm: 284561.708 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      150/  159576 | consumed samples:         2400 | elapsed time per iteration (ms): 13301.3 | learning rate: 6.657E-07 | global batch size:    16 | lm loss: 8.639321E+00 | loss scale: 4096.0 | grad norm: 158457.469 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      151/  159576 | consumed samples:         2416 | elapsed time per iteration (ms): 13553.4 | learning rate: 6.701E-07 | global batch size:    16 | lm loss: 8.747204E+00 | loss scale: 4096.0 | grad norm: 217035.827 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      152/  159576 | consumed samples:         2432 | elapsed time per iteration (ms): 13577.6 | learning rate: 6.746E-07 | global batch size:    16 | lm loss: 8.711011E+00 | loss scale: 4096.0 | grad norm: 170149.010 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      153/  159576 | consumed samples:         2448 | elapsed time per iteration (ms): 13522.0 | learning rate: 6.790E-07 | global batch size:    16 | lm loss: 8.717499E+00 | loss scale: 4096.0 | grad norm: 103133.580 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      154/  159576 | consumed samples:         2464 | elapsed time per iteration (ms): 13883.8 | learning rate: 6.834E-07 | global batch size:    16 | lm loss: 8.587013E+00 | loss scale: 4096.0 | grad norm: 99765.078 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      155/  159576 | consumed samples:         2480 | elapsed time per iteration (ms): 13554.0 | learning rate: 6.879E-07 | global batch size:    16 | lm loss: 8.698885E+00 | loss scale: 4096.0 | grad norm: 282680.223 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      156/  159576 | consumed samples:         2496 | elapsed time per iteration (ms): 13692.4 | learning rate: 6.923E-07 | global batch size:    16 | lm loss: 9.289864E+00 | loss scale: 4096.0 | grad norm: 609278.865 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      157/  159576 | consumed samples:         2512 | elapsed time per iteration (ms): 13306.0 | learning rate: 6.967E-07 | global batch size:    16 | lm loss: 8.803203E+00 | loss scale: 4096.0 | grad norm: 221182.708 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-24 02:48:11] PULSE: tr8-104B is waiting to be scheduled (1159457_[1-10%1] on 'gpu_p13' partition)
[2021-09-24 02:48:11] PULSE: tr8-104B is scheduled to start in 18:26:36 (at 2021-09-24T21:14:48) (1161605 on 'gpu_p13' partition)
[2021-09-24 02:48:11] PULSE: tr8-104B is running for 37:09 since 2021-09-24T02:11:02 (1161730 on 'gpu_p13' partition (r6i4n7,r6i5n[7-8],r6i6n[0,6,8],r6i7n3,r7i2n[2,4-5],r7i3n2,r7i6n[2-4],r7i7n[3,7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i3n[0-2],r8i5n[3-4],r8i7n[3-6,8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8])
 iteration      158/  159576 | consumed samples:         2528 | elapsed time per iteration (ms): 13873.2 | learning rate: 7.012E-07 | global batch size:    16 | lm loss: 8.628306E+00 | loss scale: 4096.0 | grad norm: 200507.061 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      159/  159576 | consumed samples:         2544 | elapsed time per iteration (ms): 13466.2 | learning rate: 7.056E-07 | global batch size:    16 | lm loss: 8.632781E+00 | loss scale: 4096.0 | grad norm: 103638.607 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      160/  159576 | consumed samples:         2560 | elapsed time per iteration (ms): 13494.3 | learning rate: 7.101E-07 | global batch size:    16 | lm loss: 8.596104E+00 | loss scale: 4096.0 | grad norm: 92105.558 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      161/  159576 | consumed samples:         2576 | elapsed time per iteration (ms): 13517.5 | learning rate: 7.145E-07 | global batch size:    16 | lm loss: 8.408714E+00 | loss scale: 4096.0 | grad norm: 78965.627 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      162/  159576 | consumed samples:         2592 | elapsed time per iteration (ms): 13540.1 | learning rate: 7.189E-07 | global batch size:    16 | lm loss: 9.134837E+00 | loss scale: 4096.0 | grad norm: 524949.559 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      163/  159576 | consumed samples:         2608 | elapsed time per iteration (ms): 13879.1 | learning rate: 7.234E-07 | global batch size:    16 | lm loss: 8.601346E+00 | loss scale: 4096.0 | grad norm: 206465.490 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      164/  159576 | consumed samples:         2624 | elapsed time per iteration (ms): 13564.5 | learning rate: 7.278E-07 | global batch size:    16 | lm loss: 8.734079E+00 | loss scale: 4096.0 | grad norm: 159985.137 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      165/  159576 | consumed samples:         2640 | elapsed time per iteration (ms): 13607.4 | learning rate: 7.322E-07 | global batch size:    16 | lm loss: 8.629238E+00 | loss scale: 4096.0 | grad norm: 89678.564 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      166/  159576 | consumed samples:         2656 | elapsed time per iteration (ms): 13687.7 | learning rate: 7.367E-07 | global batch size:    16 | lm loss: 8.753635E+00 | loss scale: 4096.0 | grad norm: 108761.613 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      167/  159576 | consumed samples:         2672 | elapsed time per iteration (ms): 14101.4 | learning rate: 7.411E-07 | global batch size:    16 | lm loss: 8.647141E+00 | loss scale: 4096.0 | grad norm: 78778.670 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      168/  159576 | consumed samples:         2688 | elapsed time per iteration (ms): 13827.5 | learning rate: 7.456E-07 | global batch size:    16 | lm loss: 8.838135E+00 | loss scale: 4096.0 | grad norm: 301360.421 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      169/  159576 | consumed samples:         2704 | elapsed time per iteration (ms): 13776.5 | learning rate: 7.500E-07 | global batch size:    16 | lm loss: 8.865972E+00 | loss scale: 4096.0 | grad norm: 230779.992 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      170/  159576 | consumed samples:         2720 | elapsed time per iteration (ms): 13667.3 | learning rate: 7.544E-07 | global batch size:    16 | lm loss: 8.716210E+00 | loss scale: 4096.0 | grad norm: 133087.211 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      171/  159576 | consumed samples:         2736 | elapsed time per iteration (ms): 13974.1 | learning rate: 7.589E-07 | global batch size:    16 | lm loss: 8.726005E+00 | loss scale: 4096.0 | grad norm: 112595.632 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      172/  159576 | consumed samples:         2752 | elapsed time per iteration (ms): 13644.3 | learning rate: 7.633E-07 | global batch size:    16 | lm loss: 8.704071E+00 | loss scale: 4096.0 | grad norm: 92111.748 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      173/  159576 | consumed samples:         2768 | elapsed time per iteration (ms): 13586.4 | learning rate: 7.678E-07 | global batch size:    16 | lm loss: 8.823001E+00 | loss scale: 4096.0 | grad norm: 93068.020 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      174/  159576 | consumed samples:         2784 | elapsed time per iteration (ms): 13629.3 | learning rate: 7.722E-07 | global batch size:    16 | lm loss: 8.521597E+00 | loss scale: 4096.0 | grad norm: 79887.666 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      175/  159576 | consumed samples:         2800 | elapsed time per iteration (ms): 13647.0 | learning rate: 7.766E-07 | global batch size:    16 | lm loss: 9.370278E+00 | loss scale: 4096.0 | grad norm: 576797.121 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      176/  159576 | consumed samples:         2816 | elapsed time per iteration (ms): 13993.8 | learning rate: 7.811E-07 | global batch size:    16 | lm loss: 9.255205E+00 | loss scale: 4096.0 | grad norm: 337846.372 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      177/  159576 | consumed samples:         2832 | elapsed time per iteration (ms): 13778.2 | learning rate: 7.855E-07 | global batch size:    16 | lm loss: 9.038449E+00 | loss scale: 4096.0 | grad norm: 339366.601 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      178/  159576 | consumed samples:         2848 | elapsed time per iteration (ms): 13515.3 | learning rate: 7.899E-07 | global batch size:    16 | lm loss: 8.771539E+00 | loss scale: 4096.0 | grad norm: 216761.610 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      179/  159576 | consumed samples:         2864 | elapsed time per iteration (ms): 13657.6 | learning rate: 7.944E-07 | global batch size:    16 | lm loss: 8.718536E+00 | loss scale: 4096.0 | grad norm: 103470.129 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      180/  159576 | consumed samples:         2880 | elapsed time per iteration (ms): 14095.5 | learning rate: 7.988E-07 | global batch size:    16 | lm loss: 8.968449E+00 | loss scale: 4096.0 | grad norm: 88300.652 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      181/  159576 | consumed samples:         2896 | elapsed time per iteration (ms): 13570.0 | learning rate: 8.033E-07 | global batch size:    16 | lm loss: 8.743597E+00 | loss scale: 4096.0 | grad norm: 73637.354 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      182/  159576 | consumed samples:         2912 | elapsed time per iteration (ms): 13631.2 | learning rate: 8.077E-07 | global batch size:    16 | lm loss: 8.650385E+00 | loss scale: 4096.0 | grad norm: 170612.165 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      183/  159576 | consumed samples:         2928 | elapsed time per iteration (ms): 13666.1 | learning rate: 8.121E-07 | global batch size:    16 | lm loss: 8.764441E+00 | loss scale: 4096.0 | grad norm: 157032.537 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      184/  159576 | consumed samples:         2944 | elapsed time per iteration (ms): 14033.7 | learning rate: 8.166E-07 | global batch size:    16 | lm loss: 8.546231E+00 | loss scale: 4096.0 | grad norm: 68818.140 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      185/  159576 | consumed samples:         2960 | elapsed time per iteration (ms): 13755.2 | learning rate: 8.210E-07 | global batch size:    16 | lm loss: 8.605597E+00 | loss scale: 4096.0 | grad norm: 245599.472 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      186/  159576 | consumed samples:         2976 | elapsed time per iteration (ms): 13693.9 | learning rate: 8.254E-07 | global batch size:    16 | lm loss: 8.735710E+00 | loss scale: 4096.0 | grad norm: 193090.020 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      187/  159576 | consumed samples:         2992 | elapsed time per iteration (ms): 13666.7 | learning rate: 8.299E-07 | global batch size:    16 | lm loss: 8.800616E+00 | loss scale: 4096.0 | grad norm: 121643.211 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      188/  159576 | consumed samples:         3008 | elapsed time per iteration (ms): 13617.1 | learning rate: 8.343E-07 | global batch size:    16 | lm loss: 8.450140E+00 | loss scale: 4096.0 | grad norm: 91010.312 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      189/  159576 | consumed samples:         3024 | elapsed time per iteration (ms): 14107.4 | learning rate: 8.388E-07 | global batch size:    16 | lm loss: 8.680673E+00 | loss scale: 4096.0 | grad norm: 171815.380 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      190/  159576 | consumed samples:         3040 | elapsed time per iteration (ms): 13662.7 | learning rate: 8.432E-07 | global batch size:    16 | lm loss: 8.619300E+00 | loss scale: 4096.0 | grad norm: 80825.030 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      191/  159576 | consumed samples:         3056 | elapsed time per iteration (ms): 13715.7 | learning rate: 8.476E-07 | global batch size:    16 | lm loss: 8.438683E+00 | loss scale: 4096.0 | grad norm: 68255.978 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      192/  159576 | consumed samples:         3072 | elapsed time per iteration (ms): 13611.5 | learning rate: 8.521E-07 | global batch size:    16 | lm loss: 8.685935E+00 | loss scale: 4096.0 | grad norm: 100702.747 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      193/  159576 | consumed samples:         3088 | elapsed time per iteration (ms): 14234.2 | learning rate: 8.565E-07 | global batch size:    16 | lm loss: 8.644808E+00 | loss scale: 4096.0 | grad norm: 193299.432 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      194/  159576 | consumed samples:         3104 | elapsed time per iteration (ms): 13631.4 | learning rate: 8.609E-07 | global batch size:    16 | lm loss: 8.574228E+00 | loss scale: 4096.0 | grad norm: 141638.439 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      195/  159576 | consumed samples:         3120 | elapsed time per iteration (ms): 13610.1 | learning rate: 8.654E-07 | global batch size:    16 | lm loss: 8.461662E+00 | loss scale: 4096.0 | grad norm: 102623.541 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      196/  159576 | consumed samples:         3136 | elapsed time per iteration (ms): 13581.2 | learning rate: 8.698E-07 | global batch size:    16 | lm loss: 8.478310E+00 | loss scale: 4096.0 | grad norm: 64740.797 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      197/  159576 | consumed samples:         3152 | elapsed time per iteration (ms): 13626.3 | learning rate: 8.743E-07 | global batch size:    16 | lm loss: 8.468125E+00 | loss scale: 4096.0 | grad norm: 113590.460 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      198/  159576 | consumed samples:         3168 | elapsed time per iteration (ms): 14045.8 | learning rate: 8.787E-07 | global batch size:    16 | lm loss: 8.800446E+00 | loss scale: 4096.0 | grad norm: 157117.309 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      199/  159576 | consumed samples:         3184 | elapsed time per iteration (ms): 13670.2 | learning rate: 8.831E-07 | global batch size:    16 | lm loss: 8.530574E+00 | loss scale: 4096.0 | grad norm: 71020.347 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      200/  159576 | consumed samples:         3200 | elapsed time per iteration (ms): 13673.4 | learning rate: 8.876E-07 | global batch size:    16 | lm loss: 8.573134E+00 | loss scale: 4096.0 | grad norm: 68974.846 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      201/  159576 | consumed samples:         3216 | elapsed time per iteration (ms): 13793.0 | learning rate: 8.920E-07 | global batch size:    16 | lm loss: 8.408599E+00 | loss scale: 4096.0 | grad norm: 69080.768 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      202/  159576 | consumed samples:         3232 | elapsed time per iteration (ms): 13826.3 | learning rate: 8.964E-07 | global batch size:    16 | lm loss: 8.511511E+00 | loss scale: 4096.0 | grad norm: 111260.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      203/  159576 | consumed samples:         3248 | elapsed time per iteration (ms): 13532.8 | learning rate: 9.009E-07 | global batch size:    16 | lm loss: 8.359414E+00 | loss scale: 4096.0 | grad norm: 178104.845 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      204/  159576 | consumed samples:         3264 | elapsed time per iteration (ms): 13664.5 | learning rate: 9.053E-07 | global batch size:    16 | lm loss: 8.641071E+00 | loss scale: 4096.0 | grad norm: 200697.121 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      205/  159576 | consumed samples:         3280 | elapsed time per iteration (ms): 13644.0 | learning rate: 9.098E-07 | global batch size:    16 | lm loss: 8.579686E+00 | loss scale: 4096.0 | grad norm: 127286.357 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      206/  159576 | consumed samples:         3296 | elapsed time per iteration (ms): 14372.0 | learning rate: 9.142E-07 | global batch size:    16 | lm loss: 8.340457E+00 | loss scale: 4096.0 | grad norm: 79901.241 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      207/  159576 | consumed samples:         3312 | elapsed time per iteration (ms): 13542.0 | learning rate: 9.186E-07 | global batch size:    16 | lm loss: 8.573874E+00 | loss scale: 4096.0 | grad norm: 54182.244 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      208/  159576 | consumed samples:         3328 | elapsed time per iteration (ms): 13770.4 | learning rate: 9.231E-07 | global batch size:    16 | lm loss: 8.671753E+00 | loss scale: 4096.0 | grad norm: 118528.691 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      209/  159576 | consumed samples:         3344 | elapsed time per iteration (ms): 13735.7 | learning rate: 9.275E-07 | global batch size:    16 | lm loss: 8.323320E+00 | loss scale: 4096.0 | grad norm: 84996.612 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      210/  159576 | consumed samples:         3360 | elapsed time per iteration (ms): 13465.7 | learning rate: 9.320E-07 | global batch size:    16 | lm loss: 8.521966E+00 | loss scale: 4096.0 | grad norm: 58490.816 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      211/  159576 | consumed samples:         3376 | elapsed time per iteration (ms): 14045.3 | learning rate: 9.364E-07 | global batch size:    16 | lm loss: 8.366361E+00 | loss scale: 4096.0 | grad norm: 60420.660 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      212/  159576 | consumed samples:         3392 | elapsed time per iteration (ms): 13641.0 | learning rate: 9.408E-07 | global batch size:    16 | lm loss: 8.510538E+00 | loss scale: 4096.0 | grad norm: 107003.263 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      213/  159576 | consumed samples:         3408 | elapsed time per iteration (ms): 13705.1 | learning rate: 9.453E-07 | global batch size:    16 | lm loss: 8.749462E+00 | loss scale: 4096.0 | grad norm: 127548.939 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      214/  159576 | consumed samples:         3424 | elapsed time per iteration (ms): 13700.1 | learning rate: 9.497E-07 | global batch size:    16 | lm loss: 8.406161E+00 | loss scale: 4096.0 | grad norm: 77133.513 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      215/  159576 | consumed samples:         3440 | elapsed time per iteration (ms): 14278.2 | learning rate: 9.541E-07 | global batch size:    16 | lm loss: 8.418405E+00 | loss scale: 4096.0 | grad norm: 62254.176 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      216/  159576 | consumed samples:         3456 | elapsed time per iteration (ms): 13592.8 | learning rate: 9.586E-07 | global batch size:    16 | lm loss: 8.472538E+00 | loss scale: 4096.0 | grad norm: 50530.895 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      217/  159576 | consumed samples:         3472 | elapsed time per iteration (ms): 13518.7 | learning rate: 9.630E-07 | global batch size:    16 | lm loss: 8.448650E+00 | loss scale: 4096.0 | grad norm: 80646.746 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      218/  159576 | consumed samples:         3488 | elapsed time per iteration (ms): 13661.2 | learning rate: 9.675E-07 | global batch size:    16 | lm loss: 7.734177E+00 | loss scale: 4096.0 | grad norm: 149486.567 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      219/  159576 | consumed samples:         3504 | elapsed time per iteration (ms): 14068.7 | learning rate: 9.719E-07 | global batch size:    16 | lm loss: 8.294590E+00 | loss scale: 4096.0 | grad norm: 56571.951 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      220/  159576 | consumed samples:         3520 | elapsed time per iteration (ms): 13630.3 | learning rate: 9.763E-07 | global batch size:    16 | lm loss: 8.257124E+00 | loss scale: 4096.0 | grad norm: 62046.509 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      221/  159576 | consumed samples:         3536 | elapsed time per iteration (ms): 13703.1 | learning rate: 9.808E-07 | global batch size:    16 | lm loss: 8.288898E+00 | loss scale: 4096.0 | grad norm: 59852.189 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      222/  159576 | consumed samples:         3552 | elapsed time per iteration (ms): 13772.5 | learning rate: 9.852E-07 | global batch size:    16 | lm loss: 8.155066E+00 | loss scale: 4096.0 | grad norm: 58014.079 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      223/  159576 | consumed samples:         3568 | elapsed time per iteration (ms): 13771.9 | learning rate: 9.896E-07 | global batch size:    16 | lm loss: 8.263331E+00 | loss scale: 4096.0 | grad norm: 63268.461 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      224/  159576 | consumed samples:         3584 | elapsed time per iteration (ms): 14010.9 | learning rate: 9.941E-07 | global batch size:    16 | lm loss: 8.163802E+00 | loss scale: 4096.0 | grad norm: 57272.250 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      225/  159576 | consumed samples:         3600 | elapsed time per iteration (ms): 13593.2 | learning rate: 9.985E-07 | global batch size:    16 | lm loss: 8.163125E+00 | loss scale: 4096.0 | grad norm: 42586.571 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      226/  159576 | consumed samples:         3616 | elapsed time per iteration (ms): 13655.1 | learning rate: 1.003E-06 | global batch size:    16 | lm loss: 8.360060E+00 | loss scale: 4096.0 | grad norm: 122218.171 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      227/  159576 | consumed samples:         3632 | elapsed time per iteration (ms): 13648.6 | learning rate: 1.007E-06 | global batch size:    16 | lm loss: 8.255043E+00 | loss scale: 4096.0 | grad norm: 85521.599 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      228/  159576 | consumed samples:         3648 | elapsed time per iteration (ms): 14030.4 | learning rate: 1.012E-06 | global batch size:    16 | lm loss: 8.261985E+00 | loss scale: 4096.0 | grad norm: 67005.701 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      229/  159576 | consumed samples:         3664 | elapsed time per iteration (ms): 13712.9 | learning rate: 1.016E-06 | global batch size:    16 | lm loss: 8.186491E+00 | loss scale: 4096.0 | grad norm: 56484.916 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      230/  159576 | consumed samples:         3680 | elapsed time per iteration (ms): 13908.9 | learning rate: 1.021E-06 | global batch size:    16 | lm loss: 8.405298E+00 | loss scale: 4096.0 | grad norm: 76846.855 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      231/  159576 | consumed samples:         3696 | elapsed time per iteration (ms): 13436.7 | learning rate: 1.025E-06 | global batch size:    16 | lm loss: 8.396565E+00 | loss scale: 4096.0 | grad norm: 65903.685 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      232/  159576 | consumed samples:         3712 | elapsed time per iteration (ms): 13847.3 | learning rate: 1.030E-06 | global batch size:    16 | lm loss: 8.280029E+00 | loss scale: 4096.0 | grad norm: 49376.518 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      233/  159576 | consumed samples:         3728 | elapsed time per iteration (ms): 13817.4 | learning rate: 1.034E-06 | global batch size:    16 | lm loss: 8.356775E+00 | loss scale: 4096.0 | grad norm: 59866.023 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      234/  159576 | consumed samples:         3744 | elapsed time per iteration (ms): 13586.3 | learning rate: 1.038E-06 | global batch size:    16 | lm loss: 8.429869E+00 | loss scale: 4096.0 | grad norm: 177436.133 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      235/  159576 | consumed samples:         3760 | elapsed time per iteration (ms): 13599.7 | learning rate: 1.043E-06 | global batch size:    16 | lm loss: 8.434436E+00 | loss scale: 4096.0 | grad norm: 135413.910 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      236/  159576 | consumed samples:         3776 | elapsed time per iteration (ms): 13650.1 | learning rate: 1.047E-06 | global batch size:    16 | lm loss: 8.271558E+00 | loss scale: 4096.0 | grad norm: 90861.034 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      237/  159576 | consumed samples:         3792 | elapsed time per iteration (ms): 14163.4 | learning rate: 1.052E-06 | global batch size:    16 | lm loss: 8.303068E+00 | loss scale: 4096.0 | grad norm: 54299.730 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      238/  159576 | consumed samples:         3808 | elapsed time per iteration (ms): 13595.2 | learning rate: 1.056E-06 | global batch size:    16 | lm loss: 8.246891E+00 | loss scale: 4096.0 | grad norm: 58398.807 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      239/  159576 | consumed samples:         3824 | elapsed time per iteration (ms): 13633.1 | learning rate: 1.061E-06 | global batch size:    16 | lm loss: 8.223282E+00 | loss scale: 4096.0 | grad norm: 58574.140 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      240/  159576 | consumed samples:         3840 | elapsed time per iteration (ms): 13623.5 | learning rate: 1.065E-06 | global batch size:    16 | lm loss: 8.408007E+00 | loss scale: 4096.0 | grad norm: 128668.081 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      241/  159576 | consumed samples:         3856 | elapsed time per iteration (ms): 14073.7 | learning rate: 1.070E-06 | global batch size:    16 | lm loss: 8.490035E+00 | loss scale: 4096.0 | grad norm: 228763.576 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      242/  159576 | consumed samples:         3872 | elapsed time per iteration (ms): 13568.7 | learning rate: 1.074E-06 | global batch size:    16 | lm loss: 8.217072E+00 | loss scale: 4096.0 | grad norm: 54955.773 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      243/  159576 | consumed samples:         3888 | elapsed time per iteration (ms): 13649.7 | learning rate: 1.078E-06 | global batch size:    16 | lm loss: 8.280759E+00 | loss scale: 4096.0 | grad norm: 70277.633 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      244/  159576 | consumed samples:         3904 | elapsed time per iteration (ms): 13743.3 | learning rate: 1.083E-06 | global batch size:    16 | lm loss: 8.266622E+00 | loss scale: 4096.0 | grad norm: 52088.661 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      245/  159576 | consumed samples:         3920 | elapsed time per iteration (ms): 13760.9 | learning rate: 1.087E-06 | global batch size:    16 | lm loss: 8.186391E+00 | loss scale: 4096.0 | grad norm: 45303.389 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      246/  159576 | consumed samples:         3936 | elapsed time per iteration (ms): 13869.6 | learning rate: 1.092E-06 | global batch size:    16 | lm loss: 8.217053E+00 | loss scale: 4096.0 | grad norm: 66052.613 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      247/  159576 | consumed samples:         3952 | elapsed time per iteration (ms): 13595.0 | learning rate: 1.096E-06 | global batch size:    16 | lm loss: 8.218720E+00 | loss scale: 4096.0 | grad norm: 63154.139 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      248/  159576 | consumed samples:         3968 | elapsed time per iteration (ms): 13605.0 | learning rate: 1.101E-06 | global batch size:    16 | lm loss: 8.214328E+00 | loss scale: 4096.0 | grad norm: 54827.602 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      249/  159576 | consumed samples:         3984 | elapsed time per iteration (ms): 13572.6 | learning rate: 1.105E-06 | global batch size:    16 | lm loss: 8.289627E+00 | loss scale: 4096.0 | grad norm: 112939.295 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      250/  159576 | consumed samples:         4000 | elapsed time per iteration (ms): 13869.8 | learning rate: 1.109E-06 | global batch size:    16 | lm loss: 8.362014E+00 | loss scale: 4096.0 | grad norm: 56746.466 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      251/  159576 | consumed samples:         4016 | elapsed time per iteration (ms): 13620.5 | learning rate: 1.114E-06 | global batch size:    16 | lm loss: 8.189938E+00 | loss scale: 4096.0 | grad norm: 56152.282 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      252/  159576 | consumed samples:         4032 | elapsed time per iteration (ms): 13708.2 | learning rate: 1.118E-06 | global batch size:    16 | lm loss: 8.356908E+00 | loss scale: 4096.0 | grad norm: 78498.467 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      253/  159576 | consumed samples:         4048 | elapsed time per iteration (ms): 13478.4 | learning rate: 1.123E-06 | global batch size:    16 | lm loss: 8.047684E+00 | loss scale: 4096.0 | grad norm: 66252.882 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      254/  159576 | consumed samples:         4064 | elapsed time per iteration (ms): 14231.8 | learning rate: 1.127E-06 | global batch size:    16 | lm loss: 8.279363E+00 | loss scale: 4096.0 | grad norm: 85125.935 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      255/  159576 | consumed samples:         4080 | elapsed time per iteration (ms): 13522.4 | learning rate: 1.132E-06 | global batch size:    16 | lm loss: 8.159877E+00 | loss scale: 4096.0 | grad norm: 48952.267 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      256/  159576 | consumed samples:         4096 | elapsed time per iteration (ms): 13553.5 | learning rate: 1.136E-06 | global batch size:    16 | lm loss: 8.154376E+00 | loss scale: 4096.0 | grad norm: 41715.920 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      257/  159576 | consumed samples:         4112 | elapsed time per iteration (ms): 13537.5 | learning rate: 1.141E-06 | global batch size:    16 | lm loss: 8.247561E+00 | loss scale: 4096.0 | grad norm: 57864.708 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      258/  159576 | consumed samples:         4128 | elapsed time per iteration (ms): 13659.5 | learning rate: 1.145E-06 | global batch size:    16 | lm loss: 8.167631E+00 | loss scale: 4096.0 | grad norm: 45439.745 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      259/  159576 | consumed samples:         4144 | elapsed time per iteration (ms): 14023.4 | learning rate: 1.149E-06 | global batch size:    16 | lm loss: 8.081510E+00 | loss scale: 4096.0 | grad norm: 54108.939 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      260/  159576 | consumed samples:         4160 | elapsed time per iteration (ms): 13447.5 | learning rate: 1.154E-06 | global batch size:    16 | lm loss: 8.074065E+00 | loss scale: 4096.0 | grad norm: 45799.989 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      261/  159576 | consumed samples:         4176 | elapsed time per iteration (ms): 13604.0 | learning rate: 1.158E-06 | global batch size:    16 | lm loss: 8.134088E+00 | loss scale: 4096.0 | grad norm: 34426.421 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      262/  159576 | consumed samples:         4192 | elapsed time per iteration (ms): 13632.5 | learning rate: 1.163E-06 | global batch size:    16 | lm loss: 8.331153E+00 | loss scale: 4096.0 | grad norm: 241742.321 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      263/  159576 | consumed samples:         4208 | elapsed time per iteration (ms): 14049.0 | learning rate: 1.167E-06 | global batch size:    16 | lm loss: 8.300336E+00 | loss scale: 4096.0 | grad norm: 89382.639 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      264/  159576 | consumed samples:         4224 | elapsed time per iteration (ms): 13554.0 | learning rate: 1.172E-06 | global batch size:    16 | lm loss: 8.285131E+00 | loss scale: 4096.0 | grad norm: 56471.162 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      265/  159576 | consumed samples:         4240 | elapsed time per iteration (ms): 13594.4 | learning rate: 1.176E-06 | global batch size:    16 | lm loss: 8.247953E+00 | loss scale: 4096.0 | grad norm: 59934.542 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      266/  159576 | consumed samples:         4256 | elapsed time per iteration (ms): 13722.5 | learning rate: 1.180E-06 | global batch size:    16 | lm loss: 8.086367E+00 | loss scale: 4096.0 | grad norm: 49794.894 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      267/  159576 | consumed samples:         4272 | elapsed time per iteration (ms): 13925.6 | learning rate: 1.185E-06 | global batch size:    16 | lm loss: 8.364625E+00 | loss scale: 4096.0 | grad norm: 198667.364 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      268/  159576 | consumed samples:         4288 | elapsed time per iteration (ms): 13685.9 | learning rate: 1.189E-06 | global batch size:    16 | lm loss: 8.378025E+00 | loss scale: 4096.0 | grad norm: 206726.678 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      269/  159576 | consumed samples:         4304 | elapsed time per iteration (ms): 13784.2 | learning rate: 1.194E-06 | global batch size:    16 | lm loss: 8.309950E+00 | loss scale: 4096.0 | grad norm: 102692.516 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      270/  159576 | consumed samples:         4320 | elapsed time per iteration (ms): 13426.6 | learning rate: 1.198E-06 | global batch size:    16 | lm loss: 8.437682E+00 | loss scale: 4096.0 | grad norm: 53779.480 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      271/  159576 | consumed samples:         4336 | elapsed time per iteration (ms): 13590.5 | learning rate: 1.203E-06 | global batch size:    16 | lm loss: 8.180303E+00 | loss scale: 4096.0 | grad norm: 41837.204 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      272/  159576 | consumed samples:         4352 | elapsed time per iteration (ms): 13918.1 | learning rate: 1.207E-06 | global batch size:    16 | lm loss: 8.269817E+00 | loss scale: 4096.0 | grad norm: 60250.869 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      273/  159576 | consumed samples:         4368 | elapsed time per iteration (ms): 13764.9 | learning rate: 1.212E-06 | global batch size:    16 | lm loss: 8.196259E+00 | loss scale: 4096.0 | grad norm: 51310.508 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      274/  159576 | consumed samples:         4384 | elapsed time per iteration (ms): 13543.7 | learning rate: 1.216E-06 | global batch size:    16 | lm loss: 8.111527E+00 | loss scale: 4096.0 | grad norm: 62869.218 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      275/  159576 | consumed samples:         4400 | elapsed time per iteration (ms): 13741.6 | learning rate: 1.220E-06 | global batch size:    16 | lm loss: 8.196915E+00 | loss scale: 4096.0 | grad norm: 56382.422 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      276/  159576 | consumed samples:         4416 | elapsed time per iteration (ms): 14418.6 | learning rate: 1.225E-06 | global batch size:    16 | lm loss: 8.163618E+00 | loss scale: 4096.0 | grad norm: 59897.745 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      277/  159576 | consumed samples:         4432 | elapsed time per iteration (ms): 13488.6 | learning rate: 1.229E-06 | global batch size:    16 | lm loss: 8.232466E+00 | loss scale: 4096.0 | grad norm: 106883.652 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      278/  159576 | consumed samples:         4448 | elapsed time per iteration (ms): 13680.7 | learning rate: 1.234E-06 | global batch size:    16 | lm loss: 8.285415E+00 | loss scale: 4096.0 | grad norm: 52155.013 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      279/  159576 | consumed samples:         4464 | elapsed time per iteration (ms): 13663.3 | learning rate: 1.238E-06 | global batch size:    16 | lm loss: 8.221471E+00 | loss scale: 4096.0 | grad norm: 43151.453 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      280/  159576 | consumed samples:         4480 | elapsed time per iteration (ms): 13783.3 | learning rate: 1.243E-06 | global batch size:    16 | lm loss: 7.827011E+00 | loss scale: 4096.0 | grad norm: 60081.852 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      281/  159576 | consumed samples:         4496 | elapsed time per iteration (ms): 13993.1 | learning rate: 1.247E-06 | global batch size:    16 | lm loss: 8.016405E+00 | loss scale: 4096.0 | grad norm: 60969.434 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      282/  159576 | consumed samples:         4512 | elapsed time per iteration (ms): 13747.2 | learning rate: 1.251E-06 | global batch size:    16 | lm loss: 8.205744E+00 | loss scale: 4096.0 | grad norm: 64657.162 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      283/  159576 | consumed samples:         4528 | elapsed time per iteration (ms): 13732.1 | learning rate: 1.256E-06 | global batch size:    16 | lm loss: 8.225381E+00 | loss scale: 4096.0 | grad norm: 46007.720 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      284/  159576 | consumed samples:         4544 | elapsed time per iteration (ms): 13701.8 | learning rate: 1.260E-06 | global batch size:    16 | lm loss: 8.069484E+00 | loss scale: 4096.0 | grad norm: 50539.571 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      285/  159576 | consumed samples:         4560 | elapsed time per iteration (ms): 13774.1 | learning rate: 1.265E-06 | global batch size:    16 | lm loss: 8.313256E+00 | loss scale: 4096.0 | grad norm: 75301.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      286/  159576 | consumed samples:         4576 | elapsed time per iteration (ms): 13700.1 | learning rate: 1.269E-06 | global batch size:    16 | lm loss: 8.296308E+00 | loss scale: 4096.0 | grad norm: 109402.142 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      287/  159576 | consumed samples:         4592 | elapsed time per iteration (ms): 13678.1 | learning rate: 1.274E-06 | global batch size:    16 | lm loss: 8.245502E+00 | loss scale: 4096.0 | grad norm: 53639.635 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      288/  159576 | consumed samples:         4608 | elapsed time per iteration (ms): 13698.6 | learning rate: 1.278E-06 | global batch size:    16 | lm loss: 8.137961E+00 | loss scale: 4096.0 | grad norm: 42750.465 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      289/  159576 | consumed samples:         4624 | elapsed time per iteration (ms): 14172.7 | learning rate: 1.283E-06 | global batch size:    16 | lm loss: 8.187901E+00 | loss scale: 4096.0 | grad norm: 108265.490 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      290/  159576 | consumed samples:         4640 | elapsed time per iteration (ms): 13663.7 | learning rate: 1.287E-06 | global batch size:    16 | lm loss: 8.092007E+00 | loss scale: 4096.0 | grad norm: 61613.623 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      291/  159576 | consumed samples:         4656 | elapsed time per iteration (ms): 13802.2 | learning rate: 1.291E-06 | global batch size:    16 | lm loss: 8.140871E+00 | loss scale: 4096.0 | grad norm: 73138.188 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      292/  159576 | consumed samples:         4672 | elapsed time per iteration (ms): 13588.8 | learning rate: 1.296E-06 | global batch size:    16 | lm loss: 8.096482E+00 | loss scale: 4096.0 | grad norm: 56947.365 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      293/  159576 | consumed samples:         4688 | elapsed time per iteration (ms): 13692.3 | learning rate: 1.300E-06 | global batch size:    16 | lm loss: 8.261303E+00 | loss scale: 4096.0 | grad norm: 50306.115 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      294/  159576 | consumed samples:         4704 | elapsed time per iteration (ms): 13953.1 | learning rate: 1.305E-06 | global batch size:    16 | lm loss: 8.088846E+00 | loss scale: 4096.0 | grad norm: 70651.882 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      295/  159576 | consumed samples:         4720 | elapsed time per iteration (ms): 13681.7 | learning rate: 1.309E-06 | global batch size:    16 | lm loss: 8.216883E+00 | loss scale: 4096.0 | grad norm: 109748.850 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      296/  159576 | consumed samples:         4736 | elapsed time per iteration (ms): 13680.1 | learning rate: 1.314E-06 | global batch size:    16 | lm loss: 8.011025E+00 | loss scale: 4096.0 | grad norm: 57863.308 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      297/  159576 | consumed samples:         4752 | elapsed time per iteration (ms): 13766.7 | learning rate: 1.318E-06 | global batch size:    16 | lm loss: 8.023094E+00 | loss scale: 4096.0 | grad norm: 39732.348 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      298/  159576 | consumed samples:         4768 | elapsed time per iteration (ms): 14056.0 | learning rate: 1.322E-06 | global batch size:    16 | lm loss: 8.085699E+00 | loss scale: 4096.0 | grad norm: 93534.410 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      299/  159576 | consumed samples:         4784 | elapsed time per iteration (ms): 13507.1 | learning rate: 1.327E-06 | global batch size:    16 | lm loss: 8.410425E+00 | loss scale: 4096.0 | grad norm: 42550.581 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      300/  159576 | consumed samples:         4800 | elapsed time per iteration (ms): 13670.9 | learning rate: 1.331E-06 | global batch size:    16 | lm loss: 8.125405E+00 | loss scale: 4096.0 | grad norm: 37244.445 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      301/  159576 | consumed samples:         4816 | elapsed time per iteration (ms): 13643.0 | learning rate: 1.336E-06 | global batch size:    16 | lm loss: 7.945562E+00 | loss scale: 4096.0 | grad norm: 37921.680 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      302/  159576 | consumed samples:         4832 | elapsed time per iteration (ms): 14097.2 | learning rate: 1.340E-06 | global batch size:    16 | lm loss: 8.073545E+00 | loss scale: 4096.0 | grad norm: 80879.552 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      303/  159576 | consumed samples:         4848 | elapsed time per iteration (ms): 13625.2 | learning rate: 1.345E-06 | global batch size:    16 | lm loss: 8.224352E+00 | loss scale: 4096.0 | grad norm: 75920.356 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      304/  159576 | consumed samples:         4864 | elapsed time per iteration (ms): 13709.0 | learning rate: 1.349E-06 | global batch size:    16 | lm loss: 8.025059E+00 | loss scale: 4096.0 | grad norm: 39535.605 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      305/  159576 | consumed samples:         4880 | elapsed time per iteration (ms): 13741.5 | learning rate: 1.354E-06 | global batch size:    16 | lm loss: 8.094482E+00 | loss scale: 4096.0 | grad norm: 40630.922 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      306/  159576 | consumed samples:         4896 | elapsed time per iteration (ms): 13523.7 | learning rate: 1.358E-06 | global batch size:    16 | lm loss: 8.135887E+00 | loss scale: 4096.0 | grad norm: 80825.550 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      307/  159576 | consumed samples:         4912 | elapsed time per iteration (ms): 14093.4 | learning rate: 1.362E-06 | global batch size:    16 | lm loss: 8.292034E+00 | loss scale: 4096.0 | grad norm: 86171.888 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      308/  159576 | consumed samples:         4928 | elapsed time per iteration (ms): 13647.9 | learning rate: 1.367E-06 | global batch size:    16 | lm loss: 8.204563E+00 | loss scale: 4096.0 | grad norm: 46698.010 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      309/  159576 | consumed samples:         4944 | elapsed time per iteration (ms): 13637.2 | learning rate: 1.371E-06 | global batch size:    16 | lm loss: 8.033182E+00 | loss scale: 4096.0 | grad norm: 42089.185 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      310/  159576 | consumed samples:         4960 | elapsed time per iteration (ms): 13700.6 | learning rate: 1.376E-06 | global batch size:    16 | lm loss: 8.048797E+00 | loss scale: 4096.0 | grad norm: 56022.805 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      311/  159576 | consumed samples:         4976 | elapsed time per iteration (ms): 14085.5 | learning rate: 1.380E-06 | global batch size:    16 | lm loss: 7.623003E+00 | loss scale: 4096.0 | grad norm: 72171.220 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      312/  159576 | consumed samples:         4992 | elapsed time per iteration (ms): 13830.9 | learning rate: 1.385E-06 | global batch size:    16 | lm loss: 8.082812E+00 | loss scale: 4096.0 | grad norm: 39681.453 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      313/  159576 | consumed samples:         5008 | elapsed time per iteration (ms): 13533.9 | learning rate: 1.389E-06 | global batch size:    16 | lm loss: 8.116117E+00 | loss scale: 4096.0 | grad norm: 33726.889 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      314/  159576 | consumed samples:         5024 | elapsed time per iteration (ms): 13637.3 | learning rate: 1.393E-06 | global batch size:    16 | lm loss: 8.210217E+00 | loss scale: 4096.0 | grad norm: 89402.073 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      315/  159576 | consumed samples:         5040 | elapsed time per iteration (ms): 14136.6 | learning rate: 1.398E-06 | global batch size:    16 | lm loss: 7.798199E+00 | loss scale: 4096.0 | grad norm: 83566.570 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      316/  159576 | consumed samples:         5056 | elapsed time per iteration (ms): 13651.3 | learning rate: 1.402E-06 | global batch size:    16 | lm loss: 8.066372E+00 | loss scale: 4096.0 | grad norm: 38768.697 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      317/  159576 | consumed samples:         5072 | elapsed time per iteration (ms): 13641.7 | learning rate: 1.407E-06 | global batch size:    16 | lm loss: 7.876265E+00 | loss scale: 4096.0 | grad norm: 36174.406 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      318/  159576 | consumed samples:         5088 | elapsed time per iteration (ms): 13653.8 | learning rate: 1.411E-06 | global batch size:    16 | lm loss: 7.979768E+00 | loss scale: 4096.0 | grad norm: 66651.391 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      319/  159576 | consumed samples:         5104 | elapsed time per iteration (ms): 13755.9 | learning rate: 1.416E-06 | global batch size:    16 | lm loss: 8.094232E+00 | loss scale: 4096.0 | grad norm: 79088.558 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      320/  159576 | consumed samples:         5120 | elapsed time per iteration (ms): 13900.8 | learning rate: 1.420E-06 | global batch size:    16 | lm loss: 8.113304E+00 | loss scale: 4096.0 | grad norm: 52331.401 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      321/  159576 | consumed samples:         5136 | elapsed time per iteration (ms): 13649.9 | learning rate: 1.425E-06 | global batch size:    16 | lm loss: 8.128990E+00 | loss scale: 4096.0 | grad norm: 46927.679 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      322/  159576 | consumed samples:         5152 | elapsed time per iteration (ms): 13693.6 | learning rate: 1.429E-06 | global batch size:    16 | lm loss: 8.486778E+00 | loss scale: 4096.0 | grad norm: 89462.672 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      323/  159576 | consumed samples:         5168 | elapsed time per iteration (ms): 13699.8 | learning rate: 1.433E-06 | global batch size:    16 | lm loss: 8.051263E+00 | loss scale: 4096.0 | grad norm: 42680.523 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      324/  159576 | consumed samples:         5184 | elapsed time per iteration (ms): 14041.8 | learning rate: 1.438E-06 | global batch size:    16 | lm loss: 8.181097E+00 | loss scale: 4096.0 | grad norm: 43801.136 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      325/  159576 | consumed samples:         5200 | elapsed time per iteration (ms): 13711.0 | learning rate: 1.442E-06 | global batch size:    16 | lm loss: 8.171723E+00 | loss scale: 4096.0 | grad norm: 47748.407 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      326/  159576 | consumed samples:         5216 | elapsed time per iteration (ms): 13743.3 | learning rate: 1.447E-06 | global batch size:    16 | lm loss: 8.035454E+00 | loss scale: 4096.0 | grad norm: 58353.227 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      327/  159576 | consumed samples:         5232 | elapsed time per iteration (ms): 13602.7 | learning rate: 1.451E-06 | global batch size:    16 | lm loss: 8.021453E+00 | loss scale: 4096.0 | grad norm: 44165.609 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      328/  159576 | consumed samples:         5248 | elapsed time per iteration (ms): 13748.9 | learning rate: 1.456E-06 | global batch size:    16 | lm loss: 8.051726E+00 | loss scale: 4096.0 | grad norm: 35138.807 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      329/  159576 | consumed samples:         5264 | elapsed time per iteration (ms): 13961.7 | learning rate: 1.460E-06 | global batch size:    16 | lm loss: 7.960547E+00 | loss scale: 4096.0 | grad norm: 41197.060 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      330/  159576 | consumed samples:         5280 | elapsed time per iteration (ms): 13633.4 | learning rate: 1.464E-06 | global batch size:    16 | lm loss: 8.084079E+00 | loss scale: 4096.0 | grad norm: 43199.182 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      331/  159576 | consumed samples:         5296 | elapsed time per iteration (ms): 13678.9 | learning rate: 1.469E-06 | global batch size:    16 | lm loss: 8.243130E+00 | loss scale: 4096.0 | grad norm: 39935.584 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      332/  159576 | consumed samples:         5312 | elapsed time per iteration (ms): 13653.3 | learning rate: 1.473E-06 | global batch size:    16 | lm loss: 8.148146E+00 | loss scale: 4096.0 | grad norm: 31710.971 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      333/  159576 | consumed samples:         5328 | elapsed time per iteration (ms): 13982.9 | learning rate: 1.478E-06 | global batch size:    16 | lm loss: 8.055049E+00 | loss scale: 4096.0 | grad norm: 40555.458 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      334/  159576 | consumed samples:         5344 | elapsed time per iteration (ms): 13576.5 | learning rate: 1.482E-06 | global batch size:    16 | lm loss: 8.154724E+00 | loss scale: 4096.0 | grad norm: 98189.157 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      335/  159576 | consumed samples:         5360 | elapsed time per iteration (ms): 13666.3 | learning rate: 1.487E-06 | global batch size:    16 | lm loss: 8.056485E+00 | loss scale: 4096.0 | grad norm: 53277.066 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      336/  159576 | consumed samples:         5376 | elapsed time per iteration (ms): 13667.7 | learning rate: 1.491E-06 | global batch size:    16 | lm loss: 7.902112E+00 | loss scale: 4096.0 | grad norm: 35520.620 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      337/  159576 | consumed samples:         5392 | elapsed time per iteration (ms): 14189.1 | learning rate: 1.496E-06 | global batch size:    16 | lm loss: 8.211933E+00 | loss scale: 4096.0 | grad norm: 102636.452 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      338/  159576 | consumed samples:         5408 | elapsed time per iteration (ms): 13538.3 | learning rate: 1.500E-06 | global batch size:    16 | lm loss: 8.077993E+00 | loss scale: 4096.0 | grad norm: 74161.424 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      339/  159576 | consumed samples:         5424 | elapsed time per iteration (ms): 13690.1 | learning rate: 1.504E-06 | global batch size:    16 | lm loss: 8.002722E+00 | loss scale: 4096.0 | grad norm: 41178.202 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      340/  159576 | consumed samples:         5440 | elapsed time per iteration (ms): 13761.4 | learning rate: 1.509E-06 | global batch size:    16 | lm loss: 8.070647E+00 | loss scale: 4096.0 | grad norm: 146660.160 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      341/  159576 | consumed samples:         5456 | elapsed time per iteration (ms): 13679.6 | learning rate: 1.513E-06 | global batch size:    16 | lm loss: 8.211810E+00 | loss scale: 4096.0 | grad norm: 56011.276 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      342/  159576 | consumed samples:         5472 | elapsed time per iteration (ms): 13958.7 | learning rate: 1.518E-06 | global batch size:    16 | lm loss: 8.028828E+00 | loss scale: 4096.0 | grad norm: 45507.509 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      343/  159576 | consumed samples:         5488 | elapsed time per iteration (ms): 13796.1 | learning rate: 1.522E-06 | global batch size:    16 | lm loss: 8.000618E+00 | loss scale: 4096.0 | grad norm: 41366.016 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      344/  159576 | consumed samples:         5504 | elapsed time per iteration (ms): 13566.5 | learning rate: 1.527E-06 | global batch size:    16 | lm loss: 8.106353E+00 | loss scale: 4096.0 | grad norm: 86487.826 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      345/  159576 | consumed samples:         5520 | elapsed time per iteration (ms): 13617.7 | learning rate: 1.531E-06 | global batch size:    16 | lm loss: 8.130958E+00 | loss scale: 4096.0 | grad norm: 65559.636 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      346/  159576 | consumed samples:         5536 | elapsed time per iteration (ms): 14006.3 | learning rate: 1.536E-06 | global batch size:    16 | lm loss: 8.100373E+00 | loss scale: 4096.0 | grad norm: 50918.888 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      347/  159576 | consumed samples:         5552 | elapsed time per iteration (ms): 13652.0 | learning rate: 1.540E-06 | global batch size:    16 | lm loss: 8.193462E+00 | loss scale: 4096.0 | grad norm: 49482.923 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      348/  159576 | consumed samples:         5568 | elapsed time per iteration (ms): 13785.4 | learning rate: 1.544E-06 | global batch size:    16 | lm loss: 8.185720E+00 | loss scale: 4096.0 | grad norm: 33616.818 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      349/  159576 | consumed samples:         5584 | elapsed time per iteration (ms): 13534.7 | learning rate: 1.549E-06 | global batch size:    16 | lm loss: 7.997324E+00 | loss scale: 4096.0 | grad norm: 41224.808 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      350/  159576 | consumed samples:         5600 | elapsed time per iteration (ms): 14148.0 | learning rate: 1.553E-06 | global batch size:    16 | lm loss: 8.069170E+00 | loss scale: 4096.0 | grad norm: 61139.413 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      351/  159576 | consumed samples:         5616 | elapsed time per iteration (ms): 13626.0 | learning rate: 1.558E-06 | global batch size:    16 | lm loss: 8.052499E+00 | loss scale: 4096.0 | grad norm: 58965.426 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      352/  159576 | consumed samples:         5632 | elapsed time per iteration (ms): 13633.5 | learning rate: 1.562E-06 | global batch size:    16 | lm loss: 8.036291E+00 | loss scale: 4096.0 | grad norm: 38820.487 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      353/  159576 | consumed samples:         5648 | elapsed time per iteration (ms): 13648.6 | learning rate: 1.567E-06 | global batch size:    16 | lm loss: 8.007360E+00 | loss scale: 4096.0 | grad norm: 33342.929 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      354/  159576 | consumed samples:         5664 | elapsed time per iteration (ms): 13707.0 | learning rate: 1.571E-06 | global batch size:    16 | lm loss: 7.890161E+00 | loss scale: 4096.0 | grad norm: 62589.896 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      355/  159576 | consumed samples:         5680 | elapsed time per iteration (ms): 14101.4 | learning rate: 1.575E-06 | global batch size:    16 | lm loss: 8.034273E+00 | loss scale: 4096.0 | grad norm: 62100.784 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      356/  159576 | consumed samples:         5696 | elapsed time per iteration (ms): 13548.4 | learning rate: 1.580E-06 | global batch size:    16 | lm loss: 7.964279E+00 | loss scale: 4096.0 | grad norm: 37283.643 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      357/  159576 | consumed samples:         5712 | elapsed time per iteration (ms): 13655.3 | learning rate: 1.584E-06 | global batch size:    16 | lm loss: 7.882459E+00 | loss scale: 4096.0 | grad norm: 36278.786 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      358/  159576 | consumed samples:         5728 | elapsed time per iteration (ms): 13872.1 | learning rate: 1.589E-06 | global batch size:    16 | lm loss: 8.081428E+00 | loss scale: 4096.0 | grad norm: 59624.520 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      359/  159576 | consumed samples:         5744 | elapsed time per iteration (ms): 13830.3 | learning rate: 1.593E-06 | global batch size:    16 | lm loss: 8.345490E+00 | loss scale: 4096.0 | grad norm: 101818.152 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      360/  159576 | consumed samples:         5760 | elapsed time per iteration (ms): 13738.3 | learning rate: 1.598E-06 | global batch size:    16 | lm loss: 8.090802E+00 | loss scale: 4096.0 | grad norm: 37735.210 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      361/  159576 | consumed samples:         5776 | elapsed time per iteration (ms): 13673.7 | learning rate: 1.602E-06 | global batch size:    16 | lm loss: 7.934822E+00 | loss scale: 4096.0 | grad norm: 35051.225 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      362/  159576 | consumed samples:         5792 | elapsed time per iteration (ms): 13779.0 | learning rate: 1.607E-06 | global batch size:    16 | lm loss: 8.217977E+00 | loss scale: 4096.0 | grad norm: 81671.155 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      363/  159576 | consumed samples:         5808 | elapsed time per iteration (ms): 14148.6 | learning rate: 1.611E-06 | global batch size:    16 | lm loss: 7.956856E+00 | loss scale: 4096.0 | grad norm: 123728.069 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      364/  159576 | consumed samples:         5824 | elapsed time per iteration (ms): 13509.6 | learning rate: 1.615E-06 | global batch size:    16 | lm loss: 7.980748E+00 | loss scale: 4096.0 | grad norm: 64323.538 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      365/  159576 | consumed samples:         5840 | elapsed time per iteration (ms): 13791.1 | learning rate: 1.620E-06 | global batch size:    16 | lm loss: 7.927495E+00 | loss scale: 4096.0 | grad norm: 38595.229 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      366/  159576 | consumed samples:         5856 | elapsed time per iteration (ms): 13535.8 | learning rate: 1.624E-06 | global batch size:    16 | lm loss: 7.992770E+00 | loss scale: 4096.0 | grad norm: 34786.799 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      367/  159576 | consumed samples:         5872 | elapsed time per iteration (ms): 13709.6 | learning rate: 1.629E-06 | global batch size:    16 | lm loss: 8.033854E+00 | loss scale: 4096.0 | grad norm: 26681.238 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      368/  159576 | consumed samples:         5888 | elapsed time per iteration (ms): 13923.8 | learning rate: 1.633E-06 | global batch size:    16 | lm loss: 8.086361E+00 | loss scale: 4096.0 | grad norm: 116063.612 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      369/  159576 | consumed samples:         5904 | elapsed time per iteration (ms): 13743.2 | learning rate: 1.638E-06 | global batch size:    16 | lm loss: 8.136069E+00 | loss scale: 4096.0 | grad norm: 192843.981 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      370/  159576 | consumed samples:         5920 | elapsed time per iteration (ms): 13586.5 | learning rate: 1.642E-06 | global batch size:    16 | lm loss: 8.213842E+00 | loss scale: 4096.0 | grad norm: 66749.630 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      371/  159576 | consumed samples:         5936 | elapsed time per iteration (ms): 13637.5 | learning rate: 1.646E-06 | global batch size:    16 | lm loss: 7.862526E+00 | loss scale: 4096.0 | grad norm: 35628.877 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      372/  159576 | consumed samples:         5952 | elapsed time per iteration (ms): 14269.3 | learning rate: 1.651E-06 | global batch size:    16 | lm loss: 8.111351E+00 | loss scale: 4096.0 | grad norm: 51284.654 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      373/  159576 | consumed samples:         5968 | elapsed time per iteration (ms): 13424.8 | learning rate: 1.655E-06 | global batch size:    16 | lm loss: 7.860275E+00 | loss scale: 4096.0 | grad norm: 51885.287 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      374/  159576 | consumed samples:         5984 | elapsed time per iteration (ms): 13638.9 | learning rate: 1.660E-06 | global batch size:    16 | lm loss: 7.995843E+00 | loss scale: 4096.0 | grad norm: 40982.716 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      375/  159576 | consumed samples:         6000 | elapsed time per iteration (ms): 13719.8 | learning rate: 1.664E-06 | global batch size:    16 | lm loss: 7.989121E+00 | loss scale: 4096.0 | grad norm: 43694.588 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      376/  159576 | consumed samples:         6016 | elapsed time per iteration (ms): 13718.2 | learning rate: 1.669E-06 | global batch size:    16 | lm loss: 8.054690E+00 | loss scale: 4096.0 | grad norm: 56142.201 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      377/  159576 | consumed samples:         6032 | elapsed time per iteration (ms): 14087.0 | learning rate: 1.673E-06 | global batch size:    16 | lm loss: 8.145277E+00 | loss scale: 4096.0 | grad norm: 77837.877 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      378/  159576 | consumed samples:         6048 | elapsed time per iteration (ms): 13621.7 | learning rate: 1.678E-06 | global batch size:    16 | lm loss: 7.879861E+00 | loss scale: 4096.0 | grad norm: 35054.780 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      379/  159576 | consumed samples:         6064 | elapsed time per iteration (ms): 13676.7 | learning rate: 1.682E-06 | global batch size:    16 | lm loss: 7.996103E+00 | loss scale: 4096.0 | grad norm: 31871.611 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      380/  159576 | consumed samples:         6080 | elapsed time per iteration (ms): 13756.2 | learning rate: 1.686E-06 | global batch size:    16 | lm loss: 7.788074E+00 | loss scale: 4096.0 | grad norm: 30378.507 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      381/  159576 | consumed samples:         6096 | elapsed time per iteration (ms): 13731.7 | learning rate: 1.691E-06 | global batch size:    16 | lm loss: 7.998044E+00 | loss scale: 4096.0 | grad norm: 78167.228 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      382/  159576 | consumed samples:         6112 | elapsed time per iteration (ms): 13696.8 | learning rate: 1.695E-06 | global batch size:    16 | lm loss: 8.001510E+00 | loss scale: 4096.0 | grad norm: 57981.800 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      383/  159576 | consumed samples:         6128 | elapsed time per iteration (ms): 13688.0 | learning rate: 1.700E-06 | global batch size:    16 | lm loss: 8.043833E+00 | loss scale: 4096.0 | grad norm: 40631.885 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      384/  159576 | consumed samples:         6144 | elapsed time per iteration (ms): 13680.4 | learning rate: 1.704E-06 | global batch size:    16 | lm loss: 8.029270E+00 | loss scale: 4096.0 | grad norm: 31579.477 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      385/  159576 | consumed samples:         6160 | elapsed time per iteration (ms): 14057.5 | learning rate: 1.709E-06 | global batch size:    16 | lm loss: 8.156369E+00 | loss scale: 4096.0 | grad norm: 87842.060 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      386/  159576 | consumed samples:         6176 | elapsed time per iteration (ms): 13765.1 | learning rate: 1.713E-06 | global batch size:    16 | lm loss: 8.024692E+00 | loss scale: 4096.0 | grad norm: 56881.857 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      387/  159576 | consumed samples:         6192 | elapsed time per iteration (ms): 13768.8 | learning rate: 1.717E-06 | global batch size:    16 | lm loss: 7.997876E+00 | loss scale: 4096.0 | grad norm: 31105.819 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      388/  159576 | consumed samples:         6208 | elapsed time per iteration (ms): 13433.5 | learning rate: 1.722E-06 | global batch size:    16 | lm loss: 7.985063E+00 | loss scale: 4096.0 | grad norm: 78090.353 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      389/  159576 | consumed samples:         6224 | elapsed time per iteration (ms): 13675.2 | learning rate: 1.726E-06 | global batch size:    16 | lm loss: 7.926050E+00 | loss scale: 4096.0 | grad norm: 61534.683 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      390/  159576 | consumed samples:         6240 | elapsed time per iteration (ms): 13989.4 | learning rate: 1.731E-06 | global batch size:    16 | lm loss: 7.938218E+00 | loss scale: 4096.0 | grad norm: 37749.344 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      391/  159576 | consumed samples:         6256 | elapsed time per iteration (ms): 13663.4 | learning rate: 1.735E-06 | global batch size:    16 | lm loss: 7.835842E+00 | loss scale: 4096.0 | grad norm: 48700.287 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      392/  159576 | consumed samples:         6272 | elapsed time per iteration (ms): 13682.5 | learning rate: 1.740E-06 | global batch size:    16 | lm loss: 7.976984E+00 | loss scale: 4096.0 | grad norm: 45273.731 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      393/  159576 | consumed samples:         6288 | elapsed time per iteration (ms): 13680.3 | learning rate: 1.744E-06 | global batch size:    16 | lm loss: 8.063533E+00 | loss scale: 4096.0 | grad norm: 62966.350 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      394/  159576 | consumed samples:         6304 | elapsed time per iteration (ms): 14158.6 | learning rate: 1.749E-06 | global batch size:    16 | lm loss: 7.962408E+00 | loss scale: 4096.0 | grad norm: 38917.941 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      395/  159576 | consumed samples:         6320 | elapsed time per iteration (ms): 13412.3 | learning rate: 1.753E-06 | global batch size:    16 | lm loss: 7.930057E+00 | loss scale: 4096.0 | grad norm: 59046.433 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      396/  159576 | consumed samples:         6336 | elapsed time per iteration (ms): 13631.9 | learning rate: 1.757E-06 | global batch size:    16 | lm loss: 8.137497E+00 | loss scale: 4096.0 | grad norm: 51299.741 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      397/  159576 | consumed samples:         6352 | elapsed time per iteration (ms): 13706.0 | learning rate: 1.762E-06 | global batch size:    16 | lm loss: 8.020626E+00 | loss scale: 4096.0 | grad norm: 37056.313 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      398/  159576 | consumed samples:         6368 | elapsed time per iteration (ms): 14158.0 | learning rate: 1.766E-06 | global batch size:    16 | lm loss: 8.114269E+00 | loss scale: 4096.0 | grad norm: 64105.827 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      399/  159576 | consumed samples:         6384 | elapsed time per iteration (ms): 13628.9 | learning rate: 1.771E-06 | global batch size:    16 | lm loss: 8.186448E+00 | loss scale: 4096.0 | grad norm: 55633.908 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      400/  159576 | consumed samples:         6400 | elapsed time per iteration (ms): 13727.5 | learning rate: 1.775E-06 | global batch size:    16 | lm loss: 8.182411E+00 | loss scale: 4096.0 | grad norm: 51312.945 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      401/  159576 | consumed samples:         6416 | elapsed time per iteration (ms): 13749.7 | learning rate: 1.780E-06 | global batch size:    16 | lm loss: 8.020710E+00 | loss scale: 4096.0 | grad norm: 32983.756 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      402/  159576 | consumed samples:         6432 | elapsed time per iteration (ms): 13473.4 | learning rate: 1.784E-06 | global batch size:    16 | lm loss: 7.970335E+00 | loss scale: 4096.0 | grad norm: 70699.597 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      403/  159576 | consumed samples:         6448 | elapsed time per iteration (ms): 13904.7 | learning rate: 1.788E-06 | global batch size:    16 | lm loss: 7.993033E+00 | loss scale: 4096.0 | grad norm: 67107.513 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      404/  159576 | consumed samples:         6464 | elapsed time per iteration (ms): 13683.9 | learning rate: 1.793E-06 | global batch size:    16 | lm loss: 8.091874E+00 | loss scale: 4096.0 | grad norm: 26716.683 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      405/  159576 | consumed samples:         6480 | elapsed time per iteration (ms): 13642.3 | learning rate: 1.797E-06 | global batch size:    16 | lm loss: 8.088682E+00 | loss scale: 4096.0 | grad norm: 74507.909 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      406/  159576 | consumed samples:         6496 | elapsed time per iteration (ms): 13688.7 | learning rate: 1.802E-06 | global batch size:    16 | lm loss: 8.134460E+00 | loss scale: 4096.0 | grad norm: 64155.050 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      407/  159576 | consumed samples:         6512 | elapsed time per iteration (ms): 14175.7 | learning rate: 1.806E-06 | global batch size:    16 | lm loss: 8.105555E+00 | loss scale: 4096.0 | grad norm: 39464.479 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      408/  159576 | consumed samples:         6528 | elapsed time per iteration (ms): 13703.7 | learning rate: 1.811E-06 | global batch size:    16 | lm loss: 7.988219E+00 | loss scale: 4096.0 | grad norm: 39779.639 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      409/  159576 | consumed samples:         6544 | elapsed time per iteration (ms): 13499.5 | learning rate: 1.815E-06 | global batch size:    16 | lm loss: 7.931721E+00 | loss scale: 4096.0 | grad norm: 46421.169 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      410/  159576 | consumed samples:         6560 | elapsed time per iteration (ms): 13608.5 | learning rate: 1.820E-06 | global batch size:    16 | lm loss: 7.944845E+00 | loss scale: 4096.0 | grad norm: 28537.165 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      411/  159576 | consumed samples:         6576 | elapsed time per iteration (ms): 14088.6 | learning rate: 1.824E-06 | global batch size:    16 | lm loss: 7.955441E+00 | loss scale: 4096.0 | grad norm: 68818.472 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      412/  159576 | consumed samples:         6592 | elapsed time per iteration (ms): 13613.5 | learning rate: 1.828E-06 | global batch size:    16 | lm loss: 8.293702E+00 | loss scale: 4096.0 | grad norm: 73315.445 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      413/  159576 | consumed samples:         6608 | elapsed time per iteration (ms): 13670.1 | learning rate: 1.833E-06 | global batch size:    16 | lm loss: 7.982622E+00 | loss scale: 4096.0 | grad norm: 40882.033 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      414/  159576 | consumed samples:         6624 | elapsed time per iteration (ms): 13753.2 | learning rate: 1.837E-06 | global batch size:    16 | lm loss: 7.981937E+00 | loss scale: 4096.0 | grad norm: 34929.207 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      415/  159576 | consumed samples:         6640 | elapsed time per iteration (ms): 13749.7 | learning rate: 1.842E-06 | global batch size:    16 | lm loss: 8.060836E+00 | loss scale: 4096.0 | grad norm: 47572.261 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      416/  159576 | consumed samples:         6656 | elapsed time per iteration (ms): 13758.6 | learning rate: 1.846E-06 | global batch size:    16 | lm loss: 8.002974E+00 | loss scale: 4096.0 | grad norm: 37872.224 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      417/  159576 | consumed samples:         6672 | elapsed time per iteration (ms): 13599.2 | learning rate: 1.851E-06 | global batch size:    16 | lm loss: 7.972270E+00 | loss scale: 4096.0 | grad norm: 44233.921 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      418/  159576 | consumed samples:         6688 | elapsed time per iteration (ms): 13571.0 | learning rate: 1.855E-06 | global batch size:    16 | lm loss: 8.249717E+00 | loss scale: 4096.0 | grad norm: 60770.929 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      419/  159576 | consumed samples:         6704 | elapsed time per iteration (ms): 13598.5 | learning rate: 1.859E-06 | global batch size:    16 | lm loss: 7.861569E+00 | loss scale: 4096.0 | grad norm: 31277.711 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      420/  159576 | consumed samples:         6720 | elapsed time per iteration (ms): 14077.1 | learning rate: 1.864E-06 | global batch size:    16 | lm loss: 7.965170E+00 | loss scale: 4096.0 | grad norm: 72793.609 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      421/  159576 | consumed samples:         6736 | elapsed time per iteration (ms): 13383.0 | learning rate: 1.868E-06 | global batch size:    16 | lm loss: 7.907632E+00 | loss scale: 4096.0 | grad norm: 60405.796 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      422/  159576 | consumed samples:         6752 | elapsed time per iteration (ms): 13739.1 | learning rate: 1.873E-06 | global batch size:    16 | lm loss: 8.041030E+00 | loss scale: 4096.0 | grad norm: 49156.237 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      423/  159576 | consumed samples:         6768 | elapsed time per iteration (ms): 13364.3 | learning rate: 1.877E-06 | global batch size:    16 | lm loss: 7.965994E+00 | loss scale: 4096.0 | grad norm: 37382.408 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      424/  159576 | consumed samples:         6784 | elapsed time per iteration (ms): 13509.2 | learning rate: 1.882E-06 | global batch size:    16 | lm loss: 7.979969E+00 | loss scale: 4096.0 | grad norm: 30214.011 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      425/  159576 | consumed samples:         6800 | elapsed time per iteration (ms): 13784.5 | learning rate: 1.886E-06 | global batch size:    16 | lm loss: 7.877289E+00 | loss scale: 4096.0 | grad norm: 31571.817 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      426/  159576 | consumed samples:         6816 | elapsed time per iteration (ms): 13491.5 | learning rate: 1.891E-06 | global batch size:    16 | lm loss: 8.049381E+00 | loss scale: 4096.0 | grad norm: 61185.189 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      427/  159576 | consumed samples:         6832 | elapsed time per iteration (ms): 13530.6 | learning rate: 1.895E-06 | global batch size:    16 | lm loss: 7.963693E+00 | loss scale: 4096.0 | grad norm: 45639.191 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      428/  159576 | consumed samples:         6848 | elapsed time per iteration (ms): 13594.4 | learning rate: 1.899E-06 | global batch size:    16 | lm loss: 7.874112E+00 | loss scale: 4096.0 | grad norm: 34163.218 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      429/  159576 | consumed samples:         6864 | elapsed time per iteration (ms): 14157.2 | learning rate: 1.904E-06 | global batch size:    16 | lm loss: 8.141135E+00 | loss scale: 4096.0 | grad norm: 43864.273 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      430/  159576 | consumed samples:         6880 | elapsed time per iteration (ms): 13539.3 | learning rate: 1.908E-06 | global batch size:    16 | lm loss: 7.883408E+00 | loss scale: 4096.0 | grad norm: 38957.139 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      431/  159576 | consumed samples:         6896 | elapsed time per iteration (ms): 13542.5 | learning rate: 1.913E-06 | global batch size:    16 | lm loss: 7.858832E+00 | loss scale: 4096.0 | grad norm: 26292.591 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      432/  159576 | consumed samples:         6912 | elapsed time per iteration (ms): 13843.5 | learning rate: 1.917E-06 | global batch size:    16 | lm loss: 7.901114E+00 | loss scale: 4096.0 | grad norm: 65782.734 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      433/  159576 | consumed samples:         6928 | elapsed time per iteration (ms): 13570.9 | learning rate: 1.922E-06 | global batch size:    16 | lm loss: 8.025250E+00 | loss scale: 4096.0 | grad norm: 99671.911 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      434/  159576 | consumed samples:         6944 | elapsed time per iteration (ms): 13645.1 | learning rate: 1.926E-06 | global batch size:    16 | lm loss: 7.512252E+00 | loss scale: 4096.0 | grad norm: 55130.336 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      435/  159576 | consumed samples:         6960 | elapsed time per iteration (ms): 13607.8 | learning rate: 1.930E-06 | global batch size:    16 | lm loss: 7.858408E+00 | loss scale: 4096.0 | grad norm: 33670.129 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      436/  159576 | consumed samples:         6976 | elapsed time per iteration (ms): 13679.8 | learning rate: 1.935E-06 | global batch size:    16 | lm loss: 7.844939E+00 | loss scale: 4096.0 | grad norm: 39814.378 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      437/  159576 | consumed samples:         6992 | elapsed time per iteration (ms): 13689.9 | learning rate: 1.939E-06 | global batch size:    16 | lm loss: 8.013271E+00 | loss scale: 4096.0 | grad norm: 62672.031 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      438/  159576 | consumed samples:         7008 | elapsed time per iteration (ms): 13781.3 | learning rate: 1.944E-06 | global batch size:    16 | lm loss: 7.903483E+00 | loss scale: 4096.0 | grad norm: 41414.951 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      439/  159576 | consumed samples:         7024 | elapsed time per iteration (ms): 13527.3 | learning rate: 1.948E-06 | global batch size:    16 | lm loss: 8.131282E+00 | loss scale: 4096.0 | grad norm: 32283.331 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      440/  159576 | consumed samples:         7040 | elapsed time per iteration (ms): 13501.3 | learning rate: 1.953E-06 | global batch size:    16 | lm loss: 7.865626E+00 | loss scale: 4096.0 | grad norm: 35041.386 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      441/  159576 | consumed samples:         7056 | elapsed time per iteration (ms): 13519.5 | learning rate: 1.957E-06 | global batch size:    16 | lm loss: 7.741554E+00 | loss scale: 4096.0 | grad norm: 36249.919 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      442/  159576 | consumed samples:         7072 | elapsed time per iteration (ms): 14043.2 | learning rate: 1.962E-06 | global batch size:    16 | lm loss: 7.954229E+00 | loss scale: 4096.0 | grad norm: 73161.393 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      443/  159576 | consumed samples:         7088 | elapsed time per iteration (ms): 13566.1 | learning rate: 1.966E-06 | global batch size:    16 | lm loss: 7.943119E+00 | loss scale: 4096.0 | grad norm: 46167.002 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      444/  159576 | consumed samples:         7104 | elapsed time per iteration (ms): 13755.3 | learning rate: 1.970E-06 | global batch size:    16 | lm loss: 7.861948E+00 | loss scale: 4096.0 | grad norm: 37826.022 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      445/  159576 | consumed samples:         7120 | elapsed time per iteration (ms): 13434.4 | learning rate: 1.975E-06 | global batch size:    16 | lm loss: 7.838496E+00 | loss scale: 4096.0 | grad norm: 56817.525 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      446/  159576 | consumed samples:         7136 | elapsed time per iteration (ms): 13607.2 | learning rate: 1.979E-06 | global batch size:    16 | lm loss: 7.932389E+00 | loss scale: 4096.0 | grad norm: 38213.438 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      447/  159576 | consumed samples:         7152 | elapsed time per iteration (ms): 14012.8 | learning rate: 1.984E-06 | global batch size:    16 | lm loss: 7.808257E+00 | loss scale: 4096.0 | grad norm: 37539.445 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      448/  159576 | consumed samples:         7168 | elapsed time per iteration (ms): 13428.4 | learning rate: 1.988E-06 | global batch size:    16 | lm loss: 7.818873E+00 | loss scale: 4096.0 | grad norm: 58774.552 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      449/  159576 | consumed samples:         7184 | elapsed time per iteration (ms): 13533.7 | learning rate: 1.993E-06 | global batch size:    16 | lm loss: 8.147743E+00 | loss scale: 4096.0 | grad norm: 62996.237 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      450/  159576 | consumed samples:         7200 | elapsed time per iteration (ms): 13606.8 | learning rate: 1.997E-06 | global batch size:    16 | lm loss: 8.094215E+00 | loss scale: 4096.0 | grad norm: 28180.185 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      451/  159576 | consumed samples:         7216 | elapsed time per iteration (ms): 14132.6 | learning rate: 2.001E-06 | global batch size:    16 | lm loss: 7.781518E+00 | loss scale: 4096.0 | grad norm: 44504.183 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      452/  159576 | consumed samples:         7232 | elapsed time per iteration (ms): 13368.4 | learning rate: 2.006E-06 | global batch size:    16 | lm loss: 8.044688E+00 | loss scale: 4096.0 | grad norm: 88794.745 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      453/  159576 | consumed samples:         7248 | elapsed time per iteration (ms): 13584.3 | learning rate: 2.010E-06 | global batch size:    16 | lm loss: 7.851390E+00 | loss scale: 4096.0 | grad norm: 63860.892 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      454/  159576 | consumed samples:         7264 | elapsed time per iteration (ms): 13723.9 | learning rate: 2.015E-06 | global batch size:    16 | lm loss: 7.919715E+00 | loss scale: 4096.0 | grad norm: 52314.539 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      455/  159576 | consumed samples:         7280 | elapsed time per iteration (ms): 13869.1 | learning rate: 2.019E-06 | global batch size:    16 | lm loss: 7.873841E+00 | loss scale: 4096.0 | grad norm: 34440.715 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      456/  159576 | consumed samples:         7296 | elapsed time per iteration (ms): 13582.9 | learning rate: 2.024E-06 | global batch size:    16 | lm loss: 8.021425E+00 | loss scale: 4096.0 | grad norm: 38108.651 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      457/  159576 | consumed samples:         7312 | elapsed time per iteration (ms): 13563.2 | learning rate: 2.028E-06 | global batch size:    16 | lm loss: 8.019066E+00 | loss scale: 4096.0 | grad norm: 24882.231 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      458/  159576 | consumed samples:         7328 | elapsed time per iteration (ms): 13638.8 | learning rate: 2.033E-06 | global batch size:    16 | lm loss: 8.016552E+00 | loss scale: 4096.0 | grad norm: 20634.945 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      459/  159576 | consumed samples:         7344 | elapsed time per iteration (ms): 13616.8 | learning rate: 2.037E-06 | global batch size:    16 | lm loss: 7.754219E+00 | loss scale: 4096.0 | grad norm: 43242.810 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      460/  159576 | consumed samples:         7360 | elapsed time per iteration (ms): 13985.2 | learning rate: 2.041E-06 | global batch size:    16 | lm loss: 7.788671E+00 | loss scale: 4096.0 | grad norm: 38608.351 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      461/  159576 | consumed samples:         7376 | elapsed time per iteration (ms): 13736.9 | learning rate: 2.046E-06 | global batch size:    16 | lm loss: 7.806537E+00 | loss scale: 4096.0 | grad norm: 32594.750 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      462/  159576 | consumed samples:         7392 | elapsed time per iteration (ms): 13386.0 | learning rate: 2.050E-06 | global batch size:    16 | lm loss: 7.940393E+00 | loss scale: 4096.0 | grad norm: 27037.582 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      463/  159576 | consumed samples:         7408 | elapsed time per iteration (ms): 13564.9 | learning rate: 2.055E-06 | global batch size:    16 | lm loss: 7.988055E+00 | loss scale: 4096.0 | grad norm: 27394.266 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      464/  159576 | consumed samples:         7424 | elapsed time per iteration (ms): 14013.6 | learning rate: 2.059E-06 | global batch size:    16 | lm loss: 8.004810E+00 | loss scale: 4096.0 | grad norm: 43759.686 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      465/  159576 | consumed samples:         7440 | elapsed time per iteration (ms): 13546.2 | learning rate: 2.064E-06 | global batch size:    16 | lm loss: 7.704327E+00 | loss scale: 4096.0 | grad norm: 30191.115 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      466/  159576 | consumed samples:         7456 | elapsed time per iteration (ms): 13671.9 | learning rate: 2.068E-06 | global batch size:    16 | lm loss: 7.774131E+00 | loss scale: 4096.0 | grad norm: 26963.554 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      467/  159576 | consumed samples:         7472 | elapsed time per iteration (ms): 13643.6 | learning rate: 2.072E-06 | global batch size:    16 | lm loss: 7.856277E+00 | loss scale: 4096.0 | grad norm: 19255.502 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      468/  159576 | consumed samples:         7488 | elapsed time per iteration (ms): 13848.0 | learning rate: 2.077E-06 | global batch size:    16 | lm loss: 7.999278E+00 | loss scale: 4096.0 | grad norm: 61835.255 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      469/  159576 | consumed samples:         7504 | elapsed time per iteration (ms): 13946.4 | learning rate: 2.081E-06 | global batch size:    16 | lm loss: 7.747583E+00 | loss scale: 4096.0 | grad norm: 42910.556 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      470/  159576 | consumed samples:         7520 | elapsed time per iteration (ms): 13471.2 | learning rate: 2.086E-06 | global batch size:    16 | lm loss: 7.847405E+00 | loss scale: 4096.0 | grad norm: 29043.806 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      471/  159576 | consumed samples:         7536 | elapsed time per iteration (ms): 13595.6 | learning rate: 2.090E-06 | global batch size:    16 | lm loss: 7.886540E+00 | loss scale: 4096.0 | grad norm: 22573.188 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      472/  159576 | consumed samples:         7552 | elapsed time per iteration (ms): 13582.6 | learning rate: 2.095E-06 | global batch size:    16 | lm loss: 7.949501E+00 | loss scale: 4096.0 | grad norm: 81307.755 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      473/  159576 | consumed samples:         7568 | elapsed time per iteration (ms): 13977.1 | learning rate: 2.099E-06 | global batch size:    16 | lm loss: 7.798001E+00 | loss scale: 4096.0 | grad norm: 27221.701 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      474/  159576 | consumed samples:         7584 | elapsed time per iteration (ms): 13666.7 | learning rate: 2.104E-06 | global batch size:    16 | lm loss: 7.990824E+00 | loss scale: 4096.0 | grad norm: 50253.500 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
saving checkpoint at iteration     474 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints
[2021-09-24 04:00:46,754] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/global_step474/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration     474 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints
time (ms) | save-checkpoint: 17639.87
[exiting program after 110.0032222946485 minutes] datetime: 2021-09-24 04:00:58 
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
op name ................ installed .. compatible
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
ninja .................. [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
op name ................ installed .. compatible
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
op name ................ installed .. compatible
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
ninja .................. [92m[OKAY][0m
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
ninja .................. [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
ninja .................. [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
op name ................ installed .. compatible
--------------------------------------------------
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... async_io[93m[NO][0m 
............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0mtransformer_inference
 .. [93m[NO][0m .......utils  [92m[OKAY][0m..................
 [92m[YES][0m ...... [92m[OKAY][0m
utils ..................quantizer  [92m[YES][0m..............  ......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
quantizer .............. --------------------------------------------------[93m[NO][0m
 ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
op name ................ installed .. compatible
op name ................ installed .. compatible
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
transformer_inference transformer_inference..  ..[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
utils utils..................  ..................[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[92m[OKAY][0m
quantizer .............. quantizer[93m[NO][0m  .....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
ninja .................. [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
--------------------------------------------------
async_io ............... [93m[NO][0m ....... [93m[NO][0m
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
async_io ............... [93m[NO][0m ....... [93m[NO][0m
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
ninja .................. [92m[OKAY][0m
--------------------------------------------------
async_io ............... [93m[NO][0m ....... [93m[NO][0m
op name ................ installed .. compatible
--------------------------------------------------
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
op name ................ installed .. compatible
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... async_io[93m[NO][0m  ......................  [93m[NO][0m[93m[NO][0m
 ....... [93m[NO][0m
transformer_inference .. transformer_inference[93m[NO][0m  .........  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
utils .................. [92m[YES][0mutils  ...... ..................[92m[OKAY][0m 
[92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0mquantizer  .....................  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
op name ................ installed .. compatible
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... async_io[93m[NO][0m  ......................  [93m[NO][0m[93m[NO][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
 ....... [93m[NO][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m .......transformer_inference  [92m[OKAY][0m..
 [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ......utils  [92m[OKAY][0m..................
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
 [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0mquantizer  .....................  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utils utils..................  ..................[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
quantizer .............. quantizer[93m[NO][0m  .....................  [93m[NO][0m [92m[OKAY][0m.......
 [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
async_io ...............utils  [93m[NO][0m..................  .......[92m[YES][0m  [93m[NO][0m......
 [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m .......transformer_inference  [92m[OKAY][0m..
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
 [93m[NO][0m ....... --------------------------------------------------[92m[OKAY][0m

utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m ....... transformer_inference[93m[NO][0m 
.. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0mtransformer_inference  ........  [92m[OKAY][0m[93m[NO][0m
 ....... quantizer .............. [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
utils-------------------------------------------------- 
.................. [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
op name ................ installed .. compatible
--------------------------------------------------
async_io ............... [93m[NO][0m ....... [93m[NO][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
ninja .................. [92m[OKAY][0m
JIT compiled ops requires ninja
--------------------------------------------------
ninja .................. [92m[OKAY][0m
op name ................ installed .. compatible
--------------------------------------------------
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
op name ................ installed .. compatible
--------------------------------------------------
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
torch version .................... 1.8.1
torch cuda version ............... 11.1
--------------------------------------------------
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path ...............  ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch version torch version....................  ....................1.8.1 
1.8.1
torch cuda version torch cuda version...............  ...............11.1 
11.1
nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io async_io...............  ...............[93m[NO][0m  [93m[NO][0m.......  .......[93m[NO][0m 
[93m[NO][0m
DeepSpeed general environment info:
transformer_inference .. [93m[NO][0mtransformer_inference  .........  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
utils ..................utils  [92m[YES][0m..................  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
quantizer .............. quantizer[93m[NO][0m  .....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
DeepSpeed general environment info:
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
DeepSpeed general environment info:
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
torch version .................... 1.8.1
torch cuda version ............... 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
JIT compiled ops requires ninja
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
DeepSpeed general environment info:
JIT compiled ops requires ninja
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
async_io ............... [93m[NO][0m ....... [93m[NO][0m
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
ninja .................. [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
op name ................ installed .. compatible
--------------------------------------------------
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
op name ................ installed .. compatible
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inference .. [93m[NO][0m .......transformer_inference  [92m[OKAY][0m..
 [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ......utils  [92m[OKAY][0m..................
 [92m[YES][0m ...... quantizer[92m[OKAY][0m 
.............. [93m[NO][0mquantizer  .....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.--------------------------------------------------
--------------------------------------------------

async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
--------------------------------------------------
JIT compiled ops requires ninja
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inference .. transformer_inference[93m[NO][0m  .........  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
utils ..................utils  [92m[YES][0m ........................  [92m[YES][0m[92m[OKAY][0m 
...... [92m[OKAY][0mquantizer
 .............. [93m[NO][0m .......quantizer  [92m[OKAY][0m..............
 [93m[NO][0m ....... --------------------------------------------------[92m[OKAY][0m

--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
op name ................ installed .. compatible
--------------------------------------------------
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0mninja
 .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed sparse_attn..  ............compatible 
[93m[NO][0m-------------------------------------------------- 
....... [92m[OKAY][0m
transformer ............cpu_adam  [93m[NO][0m...............  .......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... fused_adam[92m[OKAY][0m 
............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install path torch install path...............  ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch version
 .................... torch version1.8.1 
.................... 1.8.1torch cuda version
 ............... torch cuda version11.1 
............... nvcc version11.1 
.....................nvcc version  11.2.....................
 deepspeed install path11.2 
...........deepspeed install path  ...........['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
................... deepspeed info0.4.2+bc17042, bc17042, big-science 
................... deepspeed wheel compiled w.0.4.2+bc17042, bc17042, big-science 
......deepspeed wheel compiled w.  torch 1.8, cuda 11.1......
 torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m--------------------------------------------------

DeepSpeed C++/CUDA extension op report
transformer-------------------------------------------------- 
............NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op. 
[93m[NO][0m--------------------------------------------------
JIT compiled ops requires ninja
 ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
ninja .................. [92m[OKAY][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch version torch version....................  ....................1.8.1 
1.8.1
torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed info deepspeed info...................  ...................0.4.2+bc17042, bc17042, big-science 
0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w.
 deepspeed wheel compiled w.......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
JIT compiled ops requires ninja
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... transformer_inference[93m[NO][0m  .........  [93m[NO][0m[93m[NO][0m 
....... [92m[OKAY][0m
utils .................. [92m[YES][0mtransformer_inference  ........  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
quantizer .............. [93m[NO][0mutils  .........................  [92m[OKAY][0m[92m[YES][0m
 ...... [92m[OKAY][0m
--------------------------------------------------
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
ninja .................. [92m[OKAY][0m
JIT compiled ops requires ninja
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. .. 
[93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer async_io..............  ...............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[93m[NO][0m
--------------------------------------------------
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
op name ................ installed .. compatible
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
JIT compiled ops requires ninja
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
ninja .................. [92m[OKAY][0m
torch version .................... 1.8.1
torch cuda version ............... 11.1
--------------------------------------------------
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
op name ................ installed .. compatible
--------------------------------------------------
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
async_io ............... [93m[NO][0m ....... [93m[NO][0mtransformer_inference
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
 .. [93m[NO][0m ....... [92m[OKAY][0m
utilstransformer_inference  ....................  [92m[YES][0m[93m[NO][0m  .............  [92m[OKAY][0m[92m[OKAY][0m

/bin/sh: line 0: type: git: not found
quantizer ..............utils  [93m[NO][0m..................  .......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
op name ................ installed .. compatible
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
/bin/sh: line 0: type: git: not found
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
DeepSpeed general environment info:
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version DeepSpeed general environment info:..................... 11.2

deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']torch install path
 deepspeed info...............  ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ......['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 
torch 1.8, cuda 11.1
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
--------------------------------------------------
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
DeepSpeed general environment info:torch install path
 ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version ....................['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 
1.8.1
torch versiontorch cuda version  ...................................  1.8.111.1

nvcc versiontorch cuda version  ....................................  11.211.1

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
/bin/sh: line 0: type: git: not found
deepspeed install pathnvcc version  ................................  11.2['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed install pathdeepspeed info  ..............................  0.4.2+bc17042, bc17042, big-science['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

async_io ............... [93m[NO][0m ....... [93m[NO][0m
ninja .................. [92m[OKAY][0m
deepspeed wheel compiled w.deepspeed info  .........................  torch 1.8, cuda 11.10.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
op name ................ installed .. compatible
--------------------------------------------------
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info
 deepspeed info...................  ................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
op name ................ installed .. compatible
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0mninja
 .................. fused_lamb[92m[OKAY][0m 
.............-------------------------------------------------- 
[93m[NO][0m op name.......  ................[92m[OKAY][0m 
installed .. compatible
--------------------------------------------------
sparse_attn ............cpu_adam  [93m[NO][0m...............  .......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m stochastic_transformer.......  [92m[OKAY][0m.
 [93m[NO][0m ....... fused_lamb[92m[OKAY][0m 
............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... [93m[NO][0m ....... [93m[NO][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
async_io ............... [93m[NO][0m ....... [93m[NO][0m
JIT compiled ops requires ninja
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0mtransformer_inference
 .. [93m[NO][0m ....... [92m[OKAY][0mutils
 .................. [92m[YES][0m ...... [92m[OKAY][0mutils
 .................. [92m[YES][0m ......quantizer  [92m[OKAY][0m..............
 [93m[NO][0m quantizer.......  ..............[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path DeepSpeed general environment info:........... 
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... torch install path0.4.2+bc17042, bc17042, big-science 
...............deepspeed wheel compiled w.  ...... torch 1.8, cuda 11.1
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
DeepSpeed general environment info:
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
--------------------------------------------------
torch cuda version ............... 11.1
nvcc versionDeepSpeed general environment info: ..................... 11.2

deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']torch install path
 deepspeed info...............  ................... 0.4.2+bc17042, bc17042, big-science
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
deepspeed wheel compiled w. ......['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 
torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
torch version .................... 1.8.1
torch cuda version ............... 11.1
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
torch version .................... 1.8.1
torch cuda version ............... 11.1
async_io ............... [93m[NO][0m ....... [93m[NO][0m
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. --------------------------------------------------[92m[OKAY][0m

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
op name-------------------------------------------------- 
................ NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.installed 
..-------------------------------------------------- 
compatibleJIT compiled ops requires ninja

--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... DeepSpeed general environment info:['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch version torch install path....................  ...............1.8.1 
torch cuda version ............... 11.1['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

nvcc version .....................torch version  11.2....................
 deepspeed install path1.8.1 
........... torch cuda version ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']...............
 11.1deepspeed info
 ...................nvcc version  0.4.2+bc17042, bc17042, big-science.....................
 11.2deepspeed wheel compiled w.
 deepspeed install path......  ...........torch 1.8, cuda 11.1 
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
DeepSpeed general environment info:
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
torch install path ............... DeepSpeed general environment info:['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch version .................... 1.8.1
torch install path torch cuda version...............  ............... 11.1
nvcc version .....................['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 11.2

deepspeed install path torch version...........  .................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']1.8.1

deepspeed info ...................torch cuda version  0.4.2+bc17042, bc17042, big-science...............
 deepspeed wheel compiled w.11.1 
......nvcc version  torch 1.8, cuda 11.1.....................
 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
async_io ............... [93m[NO][0m ....... [93m[NO][0m
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
DeepSpeed general environment info:
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
torch version .................... 1.8.1
torch version .................... 1.8.1
--------------------------------------------------
torch cuda version ............... 11.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
nvcc version ..................... 11.2
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
op name ................ installed .. compatible
--------------------------------------------------
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch version torch version....................  ....................1.8.1 
1.8.1
torch cuda version torch cuda version...............  ...............11.1 
11.1
nvcc version nvcc version.....................  .....................11.2 
11.2
deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
torch cuda version ............... 11.1
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
nvcc version ..................... 11.2
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ...............async_io  [93m[NO][0m...............  .......[93m[NO][0m  [93m[NO][0m.......
 [93m[NO][0m
transformer_inference ..transformer_inference  [93m[NO][0m..  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
utils .................. utils[92m[YES][0m  ........................  [92m[YES][0m[92m[OKAY][0m 
...... [92m[OKAY][0m
quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
/bin/sh: line 0: type: git: not found
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inferenceutils  ....................  [93m[NO][0m[92m[YES][0m  .............  [92m[OKAY][0m[92m[OKAY][0m

quantizer .............. utils[93m[NO][0m  .........................  [92m[OKAY][0m[92m[YES][0m
 ...... [92m[OKAY][0m
--------------------------------------------------
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
ninja .................. [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m .......-------------------------------------------------- [92m[OKAY][0m

DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
JIT compiled ops requires ninja
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
torch version .................... 1.8.1
--------------------------------------------------
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
ninja .................. [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
DeepSpeed general environment info:
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
torch cuda version ............... 11.1
nvcc version ..................... 11.2
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ...............async_io  [93m[NO][0m...............  .......[93m[NO][0m  [93m[NO][0m.......
 [93m[NO][0m
transformer_inference transformer_inference..  ..[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
utils utils..................  ..................[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

/bin/sh: line 0: type: git: not found
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
nvcc version ..................... 11.2
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  .................... ....................1.8.1 
1.8.1
torch cuda version torch cuda version...............  ...............11.1 
11.1
nvcc version nvcc version.....................  .....................11.2 
11.2deepspeed install path
 deepspeed install path...........  ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
 ...................deepspeed info  0.4.2+bc17042, bc17042, big-science...................
 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w.
 deepspeed wheel compiled w.......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
async_io ............... [93m[NO][0masync_io  ....... ...............[93m[NO][0m 
[93m[NO][0m ....... [93m[NO][0m
transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ....... .......[92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
async_io ...............async_io [93m[NO][0m  ......................  [93m[NO][0m[93m[NO][0m 
....... [93m[NO][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. transformer_inference[93m[NO][0m  .........  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
--------------------------------------------------
utils .................. utils[92m[YES][0m  ........................  [92m[YES][0m[92m[OKAY][0m 
...... [92m[OKAY][0m
quantizer .............. [93m[NO][0mquantizer  .....................  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
/bin/sh: line 0: type: git: not found
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
async_io ............... [93m[NO][0m ....... [93m[NO][0m
torch version .................... 1.8.1
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
torch cuda version ............... 11.1
nvcc version ..................... 11.2
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ...............DeepSpeed general environment info: 
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch install path ...............torch version  .................... 1.8.1
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch cuda version
 ............... 11.1torch version
 nvcc version....................  .....................1.8.1 
11.2
torch cuda versiondeepspeed install path  ..........................  11.1
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
nvcc version deepspeed info.....................  ...................11.2 
0.4.2+bc17042, bc17042, big-science
deepspeed install path deepspeed wheel compiled w............  ...... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']torch 1.8, cuda 11.1

deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
DeepSpeed general environment info:
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m [93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`....... [92m[OKAY][0m

DeepSpeed general environment info:
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
async_io ............... [93m[NO][0m ....... [93m[NO][0m
ninja .................. [92m[OKAY][0m
torch version .................... 1.8.1
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
torch cuda version ............... 11.1
op name ................ installed .. compatible
--------------------------------------------------
nvcc version ..................... 11.2
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... [93m[NO][0m .......async_io [93m[NO][0m 
............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference utils..  ..................[93m[NO][0m  [92m[YES][0m.......  ......[92m[OKAY][0m 
[92m[OKAY][0m
quantizerutils  ................................  [93m[NO][0m[92m[YES][0m  .............  [92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
/bin/sh: line 0: type: git: not found
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version .....................DeepSpeed general environment info: 11.2
deepspeed install path
 ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
torch install pathdeepspeed info  ..................................  0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inference .. transformer_inference[93m[NO][0m  .........  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
utils .................. [92m[YES][0mutils  ........................  [92m[OKAY][0m[92m[YES][0m
 ...... [92m[OKAY][0m
quantizer .............. quantizer[93m[NO][0m  .....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
ninja .................. [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
op name ................ installed .. compatible
--------------------------------------------------
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
DeepSpeed general environment info:
torch version .................... 1.8.1
torch cuda version ............... 11.1
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
nvcc version ..................... 11.2
torch version .................... 1.8.1
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch cuda version ............... 11.1
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
torch version .................... 1.8.1
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
torch cuda version ............... 11.1
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.transformer_inference .. 
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
[93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
utils .................. [92m[YES][0m async_io......  [92m[OKAY][0m...............
 [93m[NO][0m .......quantizer  [93m[NO][0m
.............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------transformer_inference 
.. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
/bin/sh: line 0: type: git: not found
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m .......[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [92m[OKAY][0m

--------------------------------------------------
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
/bin/sh: line 0: type: git: not found
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... DeepSpeed general environment info:11.1
nvcc version 
..................... 11.2
deepspeed install path ...........torch install path  ...............['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']deepspeed wheel compiled w.
 ...... torch 1.8, cuda 11.1torch version
 .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
DeepSpeed general environment info:
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
nvcc version ..................... 11.2
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
--------------------------------------------------
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch version torch version....................  ....................1.8.1 
1.8.1
torch cuda version torch cuda version...............  ...............11.1
 nvcc version11.1 
.....................nvcc version  11.2.....................
 deepspeed install path11.2 
........... deepspeed install path ...........['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ......  ...................torch 1.8, cuda 11.1 
0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

DeepSpeed general environment info:
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch version .................... 1.8.1
torch cuda version ............... 11.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
DeepSpeed general environment info:torch install path
 ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 
.................... 1.8.1torch version
 .................... torch cuda version1.8.1 ...............
 11.1
torch cuda versionnvcc version  ....................................  11.111.2

nvcc versiondeepspeed install path  ................................  11.2
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed install path
 deepspeed info...........  ................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']0.4.2+bc17042, bc17042, big-science

deepspeed info deepspeed wheel compiled w....................  ......0.4.2+bc17042, bc17042, big-science 
torch 1.8, cuda 11.1
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... DeepSpeed general environment info:
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch install pathtorch version ....................  ...............1.8.1 
torch cuda version ............... 11.1['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

nvcc version .....................torch version  11.2....................
 deepspeed install path1.8.1 
........... torch cuda version['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
............... deepspeed info11.1 
................... nvcc version0.4.2+bc17042, bc17042, big-science 
..................... deepspeed wheel compiled w.11.2 ......
 deepspeed install pathtorch 1.8, cuda 11.1 
........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda version torch cuda version...............  ...............11.1 
11.1nvcc version
 .....................nvcc version  11.2.....................
 deepspeed install path11.2 
...........deepspeed install path  ...........['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
................... deepspeed info0.4.2+bc17042, bc17042, big-science 
...................deepspeed wheel compiled w.  0.4.2+bc17042, bc17042, big-science......
 deepspeed wheel compiled w.torch 1.8, cuda 11.1 
...... torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
DeepSpeed general environment info:torch install path ............... 
torch install path ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']...............
 torch version .................... 1.8.1['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch cuda version torch version...............  ....................11.1 
1.8.1nvcc version
 .....................torch cuda version  11.2...............
 deepspeed install path11.1 
........... nvcc version ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'].....................
 11.2deepspeed info
 ...................deepspeed install path  0.4.2+bc17042, bc17042, big-science...........
 deepspeed wheel compiled w.['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
......deepspeed info  torch 1.8, cuda 11.1...................
 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
torch version .................... 1.8.1
torch cuda version ............... 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
DeepSpeed general environment info:
async_io ............... [93m[NO][0m ....... [93m[NO][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
torch version .................... 1.8.1
torch cuda version ............... 11.1
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
nvcc version ..................... 11.2
--------------------------------------------------
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io async_io...............  ...............[93m[NO][0m  [93m[NO][0m.......  .......[93m[NO][0m 
[93m[NO][0m
transformer_inference .. [93m[NO][0m transformer_inference.......  ..[92m[OKAY][0m 
/bin/sh: line 0: type: git: not found
[93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
utils .................. [92m[YES][0m utils......  [92m[OKAY][0m..................
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
 [92m[YES][0m ...... quantizer[92m[OKAY][0m
 .............. [93m[NO][0m .......quantizer  [92m[OKAY][0m..............
torch version .................... 1.8.1
 [93m[NO][0m ....... [92m[OKAY][0m--------------------------------------------------

--------------------------------------------------
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
ninja .................. [92m[OKAY][0m
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
--------------------------------------------------
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
op name ................ installed .. compatible
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
op name ................ installed .. compatible
--------------------------------------------------
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
/bin/sh: line 0: type: git: not found
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
ninja .................. [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
ninja .................. [92m[OKAY][0m
torch cuda version ............... 11.1
--------------------------------------------------
nvcc version ..................... 11.2
op name ................ installed .. compatible
--------------------------------------------------
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
sparse_attn NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.............
 --------------------------------------------------[93m[NO][0m
 JIT compiled ops requires ninja.......
 [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

--------------------------------------------------
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

JIT compiled ops requires ninja--------------------------------------------------

JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
ninjaop name  .................................. installed  ..[92m[OKAY][0m 
compatible
----------------------------------------------------------------------------------------------------

op name ................ installed .. cpu_adamcompatible 
............... --------------------------------------------------[92m[YES][0m 
...... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0mfused_adam  ...................  [93m[NO][0m[92m[OKAY][0m .......
 [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... fused_adam[92m[OKAY][0m
 ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb sparse_attn.............  ............[93m[NO][0m [93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............stochastic_transformer  [93m[NO][0m ........  [93m[NO][0m [92m[OKAY][0m.......
 [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
ninja .................. [92m[OKAY][0m
--------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io async_io...............  ...............[93m[NO][0m  .......[93m[NO][0m  [93m[NO][0m.......
 [93m[NO][0m
transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utils utils..................  ..................[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
quantizer quantizer..............  ..............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2DeepSpeed general environment info:
deepspeed install path 
........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
torch install pathdeepspeed info  ..................................  0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference utils..  ..................[93m[NO][0m  [92m[YES][0m.......  ......[92m[OKAY][0m 
[92m[OKAY][0m
quantizer utils..............  ..................[93m[NO][0m  [92m[YES][0m.......  ......[92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... [93m[NO][0m .......async_io [93m[NO][0m 
............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference transformer_inference..  ..[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
DeepSpeed general environment info:
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
DeepSpeed general environment info:
async_io ............... [93m[NO][0m ....... [93m[NO][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
torch version .................... 1.8.1
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
torch cuda version ............... 11.1
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. .............. 
[93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
--------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
DeepSpeed general environment info:
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
/bin/sh: line 0: type: git: not found
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:DeepSpeed general environment info:

torch install path torch install path...............  ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  .................... ....................1.8.1 
1.8.1
torch cuda version torch cuda version...............  ...............11.1 
11.1nvcc version
 nvcc version.....................  .....................11.2 
11.2deepspeed install path
 deepspeed install path...........  ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science
0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. deepspeed wheel compiled w.......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed info deepspeed info...................  ...................0.4.2+bc17042, bc17042, big-science 
0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w.
 deepspeed wheel compiled w.......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w.DeepSpeed general environment info: ...... torch 1.8, cuda 11.1

torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
----------------------------------------------------------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

DeepSpeed C++/CUDA extension op report
--------------------------------------------------
--------------------------------------------------JIT compiled ops requires ninja

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
ninja--------------------------------------------------
 .................. [92m[OKAY][0m
--------------------------------------------------
cpu_adamop name ...............  ................[92m[YES][0m  installed......  ..[92m[OKAY][0m 
compatible
--------------------------------------------------
fused_adam cpu_adam.............  [93m[NO][0m............... .......  [92m[YES][0m[92m[OKAY][0m 
...... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attnfused_lamb  ............ .............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m
 [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
sparse_attnstochastic_transformer  .............  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
> setting tensorboard ...
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ...............DeepSpeed general environment info: 
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch install path
 ...............torch version  .................... 1.8.1
torch cuda version['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 
............... 11.1torch version
 nvcc version....................  .....................1.8.1 
11.2
deepspeed install pathtorch cuda version ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
 deepspeed info............... ...................  11.10.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ninja installed  ....................  [92m[OKAY][0mcompatible

----------------------------------------------------------------------------------------------------

op name ................ installed .. compatible
cpu_adam-------------------------------------------------- 
............... [92m[YES][0m ...... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_adamfused_lamb  ..........................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn transformer............  ............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
transformer ............ stochastic_transformer[93m[NO][0m  ....... .[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... async_io[93m[NO][0m  ......................  [93m[NO][0m[93m[NO][0m
 ....... [93m[NO][0m
transformer_inference ..transformer_inference  [93m[NO][0m..  .......[93m[NO][0m  .......[92m[OKAY][0m 
[92m[OKAY][0m
utils utils..................  ..................[92m[YES][0m  [92m[YES][0m......  ...... [92m[OKAY][0m[92m[OKAY][0m

quantizer quantizer..............  ..............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... [93m[NO][0m .......async_io [93m[NO][0m 
............... [93m[NO][0m ....... [93m[NO][0m
transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

quantizer .............. quantizer[93m[NO][0m  .....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
----------------------------------------------------------------------------------------------------
DeepSpeed C++/CUDA extension op report

--------------------------------------------------
DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------ninja
 op name ..................................  [92m[OKAY][0minstalled
 .. --------------------------------------------------compatible

--------------------------------------------------op name
 ................ installed .. compatible
cpu_adam-------------------------------------------------- 
............... [92m[YES][0m ...... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lambfused_adam .............  .............[93m[NO][0m  [93m[NO][0m.......  [92m[OKAY][0m.......
 [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... sparse_attn[92m[OKAY][0m ............
 [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0msparse_attn
 ............ [93m[NO][0m stochastic_transformer.......  [92m[OKAY][0m.
 [93m[NO][0m .......transformer  [92m[OKAY][0m............
 [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
DeepSpeed general environment info:deepspeed install path ........... 
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ...................torch install path  0.4.2+bc17042, bc17042, big-science...............
 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... DeepSpeed general environment info:
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch versiontorch install path  ...................................  1.8.1
torch cuda version ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']11.1

nvcc version .....................torch version  11.2....................
 deepspeed install path1.8.1 
........... torch cuda version['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
............... deepspeed info11.1 
................... nvcc version0.4.2+bc17042, bc17042, big-science 
..................... deepspeed wheel compiled w.11.2 
......deepspeed install path  torch 1.8, cuda 11.1...........
 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninjaJIT compiled ops requires ninja

ninjaninja  ....................................  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

op nameop name  ................................  installedinstalled  ....  compatiblecompatible

----------------------------------------------------------------------------------------------------

cpu_adam cpu_adam...............  ...............[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
fused_adamfused_adam  ..........................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

fused_lambfused_lamb  ..........................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

sparse_attnsparse_attn  ........................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

transformertransformer  ........................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

stochastic_transformerstochastic_transformer  . .[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op report

--------------------------------------------------DeepSpeed C++/CUDA extension op report

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja--------------------------------------------------

JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninjaninja ..................  ..................[92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------op name
 ................ op nameinstalled  ..................  installedcompatible 
.. --------------------------------------------------compatible

--------------------------------------------------
cpu_adam cpu_adam...............  ...............[92m[YES][0m  [92m[YES][0m...... ......  [92m[OKAY][0m[92m[OKAY][0m

fused_adamfused_adam  ..........................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

fused_lambfused_lamb  ..........................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

sparse_attnsparse_attn  ........................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

transformertransformer  ........................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

stochastic_transformer stochastic_transformer . [93m[NO][0m.  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
fused_adamop name  .............................  [93m[NO][0minstalled  .........  [92m[OKAY][0mcompatible

--------------------------------------------------
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m .......fused_adam [92m[OKAY][0m 
............. [93m[NO][0m transformer.......  ............[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0mfused_lamb
 ............. [93m[NO][0m .......stochastic_transformer  [92m[OKAY][0m
. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report
----------------------------------------------------------------------------------------------------

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------


--------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report


JIT compiled ops requires ninjaJIT compiled ops requires ninja----------------------------------------------------------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

--------------------------------------------------JIT compiled ops requires ninja

JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


op nameop nameop nameop name    ................................................................    installedinstalledinstalled installed   ...... ..  compatible compatible

compatiblecompatible----------------------------------------------------------------------------------------------------


----------------------------------------------------------------------------------------------------

cpu_adam cpu_adam...............  ...............cpu_adam[92m[YES][0m cpu_adam [92m[YES][0m ......   [92m[OKAY][0m...............
.....................   [92m[YES][0m[92m[OKAY][0m[92m[YES][0m 
 ............  fused_adam[92m[OKAY][0m[92m[OKAY][0m 

............. [93m[NO][0m ....... fused_adam[92m[OKAY][0m 
............. [93m[NO][0mfused_lambfused_adam  fused_adam ....................  ............. .............[92m[OKAY][0m 
[93m[NO][0m [93m[NO][0m [93m[NO][0m .......fused_lamb .......  ....................  [92m[OKAY][0m[93m[NO][0m  [92m[OKAY][0m
[92m[OKAY][0m.......

 [92m[OKAY][0m
fused_lamb fused_lamb............. .............  [93m[NO][0m[93m[NO][0m  .......sparse_attn.......  [92m[OKAY][0m ............
[92m[OKAY][0msparse_attn 
[93m[NO][0m ............  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
transformer ............sparse_attntransformer sparse_attn  [93m[NO][0m........................    ...................[93m[NO][0m[93m[NO][0m    [93m[NO][0m.......[92m[OKAY][0m.......  
 .......[92m[OKAY][0m[92m[OKAY][0m 

stochastic_transformer[92m[OKAY][0m 
transformer stochastic_transformer.............transformer    [93m[NO][0m[93m[NO][0m. ............  ....... [93m[NO][0m....... [93m[NO][0m [92m[OKAY][0m .......  
[92m[OKAY][0m.......[92m[OKAY][0m

 [92m[OKAY][0m
stochastic_transformer stochastic_transformer.  [93m[NO][0m ........  [93m[NO][0m[92m[OKAY][0m
 ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ...............async_io [93m[NO][0m  ......................  [93m[NO][0m[93m[NO][0m 
....... [93m[NO][0m
transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

quantizer ..............quantizer  [93m[NO][0m..............  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  .............................. [93m[NO][0m  [93m[NO][0m.......  .......[93m[NO][0m 
[93m[NO][0m
transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utils utils..................  ..................[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
quantizer quantizer..............  ..............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utilstransformer_inference  ....................  [92m[YES][0m[93m[NO][0m  .............  [92m[OKAY][0m[92m[OKAY][0m

quantizer .............. utils[93m[NO][0m  .........................  [92m[YES][0m[92m[OKAY][0m 
...... [92m[OKAY][0m
--------------------------------------------------
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

DeepSpeed general environment info:
torch install path ............... DeepSpeed general environment info:
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version torch install path....................  1.8.1...............
 torch cuda version ............... 11.1
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']nvcc version
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
 ..................... torch version11.2 
....................deepspeed install path  1.8.1...........
async_io ............... [93m[NO][0m ....... [93m[NO][0m
 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']torch cuda version
 ...............deepspeed info  11.1...................
 0.4.2+bc17042, bc17042, big-sciencenvcc version
 deepspeed wheel compiled w......................  ......11.2 
torch 1.8, cuda 11.1deepspeed install path
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
 ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op report

--------------------------------------------------
DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja--------------------------------------------------

--------------------------------------------------

DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja

--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
----------------------------------------------------------------------------------------------------
JIT compiled ops requires ninja

DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja    ........................................................................   [92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m[92m[OKAY][0m
--------------------------------------------------


----------------------------------------------------------------------------------------------------
--------------------------------------------------
op nameop name
 op name  ................op name................................    ................installedinstalled installed  installed ....   ....compatible compatible
 compatible--------------------------------------------------
compatible
--------------------------------------------------


----------------------------------------------------------------------------------------------------

cpu_adam cpu_adam...............  cpu_adam............... [92m[YES][0mcpu_adam ...............  [92m[YES][0m.....................   [92m[OKAY][0m[92m[YES][0m  ............
 [92m[YES][0m [92m[OKAY][0m [92m[OKAY][0m
......
 [92m[OKAY][0m
fused_adam ............. fused_adam[93m[NO][0mfused_adam  ....... fused_adam............. .............  [92m[OKAY][0m [93m[NO][0m
.............[93m[NO][0m .......  fused_lamb[93m[NO][0m  [92m[OKAY][0m .............
....... ....... [93m[NO][0m [92m[OKAY][0m [92m[OKAY][0m
fused_lamb.......
  .............[92m[OKAY][0m 
[93m[NO][0mfused_lambfused_lamb   .................................   [92m[OKAY][0m[93m[NO][0m[93m[NO][0m
  ..............  sparse_attn[92m[OKAY][0m[92m[OKAY][0m 

............ [93m[NO][0m ....... sparse_attn[92m[OKAY][0m 
............ [93m[NO][0mtransformer  ...................  [92m[OKAY][0msparse_attn[93m[NO][0msparse_attn
   ...............................transformer    [92m[OKAY][0m[93m[NO][0m............[93m[NO][0m
  ....... [93m[NO][0m .......[92m[OKAY][0mstochastic_transformer 
  [92m[OKAY][0m....... 
.transformer[92m[OKAY][0m transformer[93m[NO][0m 
  ...............................   stochastic_transformer[93m[NO][0m[92m[OKAY][0m [93m[NO][0m
  ...............  [93m[NO][0m [92m[OKAY][0m [92m[OKAY][0m
.......
 [92m[OKAY][0mstochastic_transformer
 stochastic_transformer . [93m[NO][0m.  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

using world size: 256, data-parallel-size: 8, tensor-model-parallel size: 4, pipeline-model-parallel size: 8 
using torch.float16 for parameters ...
------------------------ arguments ------------------------
  accumulate_allreduce_grads_in_fp32 .............. False
  adam_beta1 ...................................... 0.9
  adam_beta2 ...................................... 0.999
  adam_eps ........................................ 1e-08
  adlr_autoresume ................................. False
  adlr_autoresume_interval ........................ 1000
  apply_query_key_layer_scaling ................... True
  apply_residual_connection_post_layernorm ........ False
  attention_dropout ............................... 0.1
  attention_softmax_in_fp32 ....................... False
  bert_binary_head ................................ True
  bert_load ....................................... None
  bf16 ............................................ False
  bias_dropout_fusion ............................. True
  bias_gelu_fusion ................................ True
  biencoder_projection_dim ........................ 0
  biencoder_shared_query_context_model ............ False
  block_data_path ................................. None
  checkpoint_activations .......................... True
  checkpoint_in_cpu ............................... False
  checkpoint_num_layers ........................... 1
  clip_grad ....................................... 1.0
  codecarbon_dir .................................. /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/tr8-104B-logs/codecarbon
  consumed_train_samples .......................... 0
  consumed_valid_samples .......................... 0
  contigious_checkpointing ........................ False
  cpu_optimizer ................................... False
  cpu_torch_adam .................................. False
  data_impl ....................................... mmap
  data_parallel_size .............................. 8
  data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document']
  dataloader_type ................................. single
  DDP_impl ........................................ local
  decoder_seq_length .............................. None
  deepscale ....................................... False
  deepscale_config ................................ None
  deepspeed ....................................... True
  deepspeed_activation_checkpointing .............. True
  deepspeed_config ................................ ./ds_config.1162747.json
  deepspeed_mpi ................................... False
  distribute_checkpointed_activations ............. False
  distributed_backend ............................. nccl
  embedding_path .................................. None
  encoder_seq_length .............................. 2048
  eod_mask_loss ................................... False
  eval_interval ................................... 1000
  eval_iters ...................................... 5
  evidence_data_path .............................. None
  exit_duration_in_mins ........................... 110
  exit_interval ................................... None
  ffn_hidden_size ................................. 20480
  finetune ........................................ False
  fp16 ............................................ True
  fp16_lm_cross_entropy ........................... False
  fp32_residual_connection ........................ False
  global_batch_size ............................... 2048
  hidden_dropout .................................. 0.1
  hidden_size ..................................... 16384
  hysteresis ...................................... 2
  ict_head_size ................................... None
  ict_load ........................................ None
  img_dim ......................................... 224
  indexer_batch_size .............................. 128
  indexer_log_interval ............................ 1000
  init_method_std ................................. 0.02
  init_method_xavier_uniform ...................... False
  initial_loss_scale .............................. 4294967296
  kv_channels ..................................... 512
  layernorm_epsilon ............................... 1e-05
  lazy_mpu_init ................................... None
  load ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints
  local_rank ...................................... 0
  log_batch_size_to_tensorboard ................... True
  log_interval .................................... 1
  log_learning_rate_to_tensorboard ................ True
  log_loss_scale_to_tensorboard ................... True
  log_num_zeros_in_grad ........................... False
  log_params_norm ................................. False
  log_timers_to_tensorboard ....................... True
  log_validation_ppl_to_tensorboard ............... True
  loss_scale ...................................... 12.0
  loss_scale_window ............................... 1000
  lr .............................................. 6e-05
  lr_decay_iters .................................. None
  lr_decay_samples ................................ 126953125
  lr_decay_style .................................. cosine
  lr_warmup_fraction .............................. None
  lr_warmup_iters ................................. 0
  lr_warmup_samples ............................... 216320
  make_vocab_size_divisible_by .................... 128
  mask_prob ....................................... 0.15
  masked_softmax_fusion ........................... True
  max_position_embeddings ......................... 2048
  memory_centric_tiled_linear ..................... False
  merge_file ...................................... /gpfswork/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/data/gpt2-merges.txt
  micro_batch_size ................................ 1
  min_loss_scale .................................. 1.0
  min_lr .......................................... 6e-06
  mmap_warmup ..................................... False
  no_load_optim ................................... None
  no_load_rng ..................................... None
  no_save_optim ................................... None
  no_save_rng ..................................... None
  num_attention_heads ............................. 32
  num_channels .................................... 3
  num_classes ..................................... 1000
  num_layers ...................................... 32
  num_layers_per_virtual_pipeline_stage ........... None
  num_workers ..................................... 2
  onnx_safe ....................................... None
  openai_gelu ..................................... False
  optimizer ....................................... adam
  override_lr_scheduler ........................... False
  params_dtype .................................... torch.float16
  partition_activations ........................... False
  patch_dim ....................................... 16
  pipeline_model_parallel_size .................... 8
  position_embedding_type ......................... PositionEmbeddingType.absolute
  profile_backward ................................ False
  query_in_block_prob ............................. 0.1
  rampup_batch_size ............................... ['16', '16', '6_000_000']
  rank ............................................ 0
  remote_device ................................... none
  reset_attention_mask ............................ False
  reset_position_ids .............................. False
  retriever_report_topk_accuracies ................ []
  retriever_score_scaling ......................... False
  retriever_seq_length ............................ 256
  sample_rate ..................................... 1.0
  save ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints
  save_interval ................................... 1500
  scatter_gather_tensors_in_pipeline .............. True
  scattered_embeddings ............................ False
  seed ............................................ 42
  seq_length ...................................... 2048
  sgd_momentum .................................... 0.9
  short_seq_prob .................................. 0.1
  split ........................................... 949,50,1
  split_transformers .............................. False
  synchronize_each_layer .......................... False
  tensor_model_parallel_size ...................... 4
  tensorboard_dir ................................. /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/tr8-104B-logs/tensorboard
  tensorboard_log_interval ........................ 1
  tensorboard_queue_size .......................... 5
  tile_factor ..................................... 1
  titles_data_path ................................ None
  tokenizer_name_or_path .......................... None
  tokenizer_type .................................. GPT2BPETokenizer
  train_iters ..................................... None
  train_samples ................................... 300000000
  use_checkpoint_lr_scheduler ..................... False
  use_contiguous_buffers_in_ddp ................... False
  use_cpu_initialization .......................... None
  use_one_sent_docs ............................... False
  use_pin_memory .................................. False
  virtual_pipeline_model_parallel_size ............ None
  vocab_extra_ids ................................. 0
  vocab_file ...................................... /gpfswork/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/data/gpt2-vocab.json
  weight_decay .................................... 0.1
  world_size ...................................... 256
  zero_allgather_bucket_size ...................... 0.0
  zero_contigious_gradients ....................... False
  zero_reduce_bucket_size ......................... 0.0
  zero_reduce_scatter ............................. False
  zero_stage ...................................... 1
-------------------- end of arguments ---------------------
will use batch size rampup starting from global batch size 16 to global batch size 2048 with batch size increments 16 over 6000000 samples.
> building GPT2BPETokenizer tokenizer ...
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io async_io...............  ...............[93m[NO][0m  [93m[NO][0m.......  .......[93m[NO][0m 
[93m[NO][0m
transformer_inference transformer_inference..  ..[93m[NO][0m  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
utils utils..................  ..................[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
quantizer quantizer..............  [93m[NO][0m..............  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... async_io[93m[NO][0m  ......................  [93m[NO][0m[93m[NO][0m
 ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m .......transformer_inference  ..[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0mutils  ........................  [92m[OKAY][0m[92m[YES][0m
 ...... [92m[OKAY][0mquantizer
 .............. [93m[NO][0mquantizer  .....................  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report--------------------------------------------------


------------------------------------------------------------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------
DeepSpeed C++/CUDA extension op report


--------------------------------------------------JIT compiled ops requires ninja--------------------------------------------------


JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------


op nameop name op nameop name ................ ................  ................ installed ................installedinstalled   ..installed ..  .. ..compatible compatible 
compatible
compatible
----------------------------------------------------------------------------------------------------
--------------------------------------------------

--------------------------------------------------

cpu_adamcpu_adam cpu_adamcpu_adam ...............  ............... ..............................  [92m[YES][0m[92m[YES][0m[92m[YES][0m    ......[92m[YES][0m ............ [92m[OKAY][0m  
......[92m[OKAY][0m 
[92m[OKAY][0m
[92m[OKAY][0m
fused_adam .............fused_adam  [93m[NO][0mfused_adam.............   ....................fused_adam [93m[NO][0m [92m[OKAY][0m [93m[NO][0m
.......   .......[92m[OKAY][0m.............fused_lamb
   [92m[OKAY][0m[93m[NO][0m.............
fused_lamb  [93m[NO][0m............. fused_lamb  ....... ....... [93m[NO][0m............. [92m[OKAY][0m [92m[OKAY][0m 
.......[93m[NO][0m  [92m[OKAY][0m.......

 [92m[OKAY][0m
sparse_attn fused_lamb............  sparse_attn.............[93m[NO][0m   sparse_attn[93m[NO][0m...................   ............[92m[OKAY][0m[93m[NO][0m 
  [93m[NO][0m..............transformer    .......[92m[OKAY][0m............ 
 [92m[OKAY][0m[93m[NO][0m
 .......[92m[OKAY][0m 
transformer[92m[OKAY][0mtransformer 
 ........................  [93m[NO][0m[93m[NO][0m stochastic_transformer ....... .......  [92m[OKAY][0m[92m[OKAY][0m.

 [93m[NO][0m .......stochastic_transformerstochastic_transformer   [92m[OKAY][0m
.sparse_attn . [93m[NO][0m ............ [93m[NO][0m .......  .......[92m[OKAY][0m 
[93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... DeepSpeed general environment info:['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch version .................... torch install path1.8.1 
............... torch cuda version ............... 11.1
nvcc version['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 
..................... 11.2torch version
 deepspeed install path....................  ...........1.8.1 
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
torch cuda version deepspeed info...............  ...................11.1 
nvcc version0.4.2+bc17042, bc17042, big-science 
.....................deepspeed wheel compiled w.  11.2......
 deepspeed install pathtorch 1.8, cuda 11.1 
........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io...............  
[93m[NO][0m...............  .......[93m[NO][0m  [93m[NO][0m.......
 [93m[NO][0m
async_iotransformer_inference  .................  transformer_inference[93m[NO][0m[93m[NO][0m   ................   [93m[NO][0m[93m[NO][0m[92m[OKAY][0m
 
....... [92m[OKAY][0m
utils .................. utils[92m[YES][0m transformer_inference .................. ......  ..[92m[YES][0m[92m[OKAY][0m  
[93m[NO][0m......  .......[92m[OKAY][0m 
quantizer[92m[OKAY][0m
 .............. quantizer[93m[NO][0m  .....................utils   [93m[NO][0m[92m[OKAY][0m.................. 
 .......[92m[YES][0m  [92m[OKAY][0m
--------------------------------------------------......
 [92m[OKAY][0m
--------------------------------------------------
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

 > padded vocab (size: 50257) with 431 dummy tokens (new size: 50688)
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
> setting codecarbon ...
> initializing torch distributed ...
> initializing tensor model parallel with size 4
> initializing pipeline model parallel with size 8
> setting random seeds to 42 ...
[2021-09-24 04:01:23,432] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 2760 and data parallel seed: 42
> compiling dataset index builder ...
make: Entering directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/data'
make: Nothing to be done for 'default'.
make: Leaving directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/data'
>>> done with dataset index builder. Compilation time: 0.299 seconds
> compiling and loading fused kernels ...
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
Detected CUDA files, patching ldflags
Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/fused_kernels/build/build.ninja...
Building extension module scaled_upper_triang_masked_softmax_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module scaled_upper_triang_masked_softmax_cuda...
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
Detected CUDA files, patching ldflags
Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/fused_kernels/build/build.ninja...
Building extension module scaled_masked_softmax_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module scaled_masked_softmax_cuda...
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
Detected CUDA files, patching ldflags
Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/fused_kernels/build/build.ninja...
Building extension module fused_mix_prec_layer_norm_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module fused_mix_prec_layer_norm_cuda...
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
>>> done with compiling and loading fused kernels. Compilation time: 17.207 seconds
time to initialize megatron (seconds): 4.980
[after megatron is initialized] datetime: 2021-09-24 04:01:40 
building GPT model ...
[2021-09-24 04:01:41,035] [INFO] [utils.py:680:see_memory_usage] Before Building Model
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/cuda/memory.py:373: FutureWarning: torch.cuda.memory_cached has been renamed to torch.cuda.memory_reserved
  warnings.warn(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/cuda/memory.py:381: FutureWarning: torch.cuda.max_memory_cached has been renamed to torch.cuda.max_memory_reserved
  warnings.warn(
[2021-09-24 04:01:41,037] [INFO] [utils.py:681:see_memory_usage] MA 0.0 GB         Max_MA 0.0 GB         CA 0.0 GB         Max_CA 0 GB 
[2021-09-24 04:01:41,037] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory:  used = 37.36 GB, percent = 20.0%
SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None
Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=0, model=1): 1, ProcessCoord(pipe=0, data=0, model=2): 2, ProcessCoord(pipe=0, data=0, model=3): 3, ProcessCoord(pipe=0, data=1, model=0): 4, ProcessCoord(pipe=0, data=1, model=1): 5, ProcessCoord(pipe=0, data=1, model=2): 6, ProcessCoord(pipe=0, data=1, model=3): 7, ProcessCoord(pipe=0, data=2, model=0): 8, ProcessCoord(pipe=0, data=2, model=1): 9, ProcessCoord(pipe=0, data=2, model=2): 10, ProcessCoord(pipe=0, data=2, model=3): 11, ProcessCoord(pipe=0, data=3, model=0): 12, ProcessCoord(pipe=0, data=3, model=1): 13, ProcessCoord(pipe=0, data=3, model=2): 14, ProcessCoord(pipe=0, data=3, model=3): 15, ProcessCoord(pipe=0, data=4, model=0): 16, ProcessCoord(pipe=0, data=4, model=1): 17, ProcessCoord(pipe=0, data=4, model=2): 18, ProcessCoord(pipe=0, data=4, model=3): 19, ProcessCoord(pipe=0, data=5, model=0): 20, ProcessCoord(pipe=0, data=5, model=1): 21, ProcessCoord(pipe=0, data=5, model=2): 22, ProcessCoord(pipe=0, data=5, model=3): 23, ProcessCoord(pipe=0, data=6, model=0): 24, ProcessCoord(pipe=0, data=6, model=1): 25, ProcessCoord(pipe=0, data=6, model=2): 26, ProcessCoord(pipe=0, data=6, model=3): 27, ProcessCoord(pipe=0, data=7, model=0): 28, ProcessCoord(pipe=0, data=7, model=1): 29, ProcessCoord(pipe=0, data=7, model=2): 30, ProcessCoord(pipe=0, data=7, model=3): 31, ProcessCoord(pipe=1, data=0, model=0): 32, ProcessCoord(pipe=1, data=0, model=1): 33, ProcessCoord(pipe=1, data=0, model=2): 34, ProcessCoord(pipe=1, data=0, model=3): 35, ProcessCoord(pipe=1, data=1, model=0): 36, ProcessCoord(pipe=1, data=1, model=1): 37, ProcessCoord(pipe=1, data=1, model=2): 38, ProcessCoord(pipe=1, data=1, model=3): 39, ProcessCoord(pipe=1, data=2, model=0): 40, ProcessCoord(pipe=1, data=2, model=1): 41, ProcessCoord(pipe=1, data=2, model=2): 42, ProcessCoord(pipe=1, data=2, model=3): 43, ProcessCoord(pipe=1, data=3, model=0): 44, ProcessCoord(pipe=1, data=3, model=1): 45, ProcessCoord(pipe=1, data=3, model=2): 46, ProcessCoord(pipe=1, data=3, model=3): 47, ProcessCoord(pipe=1, data=4, model=0): 48, ProcessCoord(pipe=1, data=4, model=1): 49, ProcessCoord(pipe=1, data=4, model=2): 50, ProcessCoord(pipe=1, data=4, model=3): 51, ProcessCoord(pipe=1, data=5, model=0): 52, ProcessCoord(pipe=1, data=5, model=1): 53, ProcessCoord(pipe=1, data=5, model=2): 54, ProcessCoord(pipe=1, data=5, model=3): 55, ProcessCoord(pipe=1, data=6, model=0): 56, ProcessCoord(pipe=1, data=6, model=1): 57, ProcessCoord(pipe=1, data=6, model=2): 58, ProcessCoord(pipe=1, data=6, model=3): 59, ProcessCoord(pipe=1, data=7, model=0): 60, ProcessCoord(pipe=1, data=7, model=1): 61, ProcessCoord(pipe=1, data=7, model=2): 62, ProcessCoord(pipe=1, data=7, model=3): 63, ProcessCoord(pipe=2, data=0, model=0): 64, ProcessCoord(pipe=2, data=0, model=1): 65, ProcessCoord(pipe=2, data=0, model=2): 66, ProcessCoord(pipe=2, data=0, model=3): 67, ProcessCoord(pipe=2, data=1, model=0): 68, ProcessCoord(pipe=2, data=1, model=1): 69, ProcessCoord(pipe=2, data=1, model=2): 70, ProcessCoord(pipe=2, data=1, model=3): 71, ProcessCoord(pipe=2, data=2, model=0): 72, ProcessCoord(pipe=2, data=2, model=1): 73, ProcessCoord(pipe=2, data=2, model=2): 74, ProcessCoord(pipe=2, data=2, model=3): 75, ProcessCoord(pipe=2, data=3, model=0): 76, ProcessCoord(pipe=2, data=3, model=1): 77, ProcessCoord(pipe=2, data=3, model=2): 78, ProcessCoord(pipe=2, data=3, model=3): 79, ProcessCoord(pipe=2, data=4, model=0): 80, ProcessCoord(pipe=2, data=4, model=1): 81, ProcessCoord(pipe=2, data=4, model=2): 82, ProcessCoord(pipe=2, data=4, model=3): 83, ProcessCoord(pipe=2, data=5, model=0): 84, ProcessCoord(pipe=2, data=5, model=1): 85, ProcessCoord(pipe=2, data=5, model=2): 86, ProcessCoord(pipe=2, data=5, model=3): 87, ProcessCoord(pipe=2, data=6, model=0): 88, ProcessCoord(pipe=2, data=6, model=1): 89, ProcessCoord(pipe=2, data=6, model=2): 90, ProcessCoord(pipe=2, data=6, model=3): 91, ProcessCoord(pipe=2, data=7, model=0): 92, ProcessCoord(pipe=2, data=7, model=1): 93, ProcessCoord(pipe=2, data=7, model=2): 94, ProcessCoord(pipe=2, data=7, model=3): 95, ProcessCoord(pipe=3, data=0, model=0): 96, ProcessCoord(pipe=3, data=0, model=1): 97, ProcessCoord(pipe=3, data=0, model=2): 98, ProcessCoord(pipe=3, data=0, model=3): 99, ProcessCoord(pipe=3, data=1, model=0): 100, ProcessCoord(pipe=3, data=1, model=1): 101, ProcessCoord(pipe=3, data=1, model=2): 102, ProcessCoord(pipe=3, data=1, model=3): 103, ProcessCoord(pipe=3, data=2, model=0): 104, ProcessCoord(pipe=3, data=2, model=1): 105, ProcessCoord(pipe=3, data=2, model=2): 106, ProcessCoord(pipe=3, data=2, model=3): 107, ProcessCoord(pipe=3, data=3, model=0): 108, ProcessCoord(pipe=3, data=3, model=1): 109, ProcessCoord(pipe=3, data=3, model=2): 110, ProcessCoord(pipe=3, data=3, model=3): 111, ProcessCoord(pipe=3, data=4, model=0): 112, ProcessCoord(pipe=3, data=4, model=1): 113, ProcessCoord(pipe=3, data=4, model=2): 114, ProcessCoord(pipe=3, data=4, model=3): 115, ProcessCoord(pipe=3, data=5, model=0): 116, ProcessCoord(pipe=3, data=5, model=1): 117, ProcessCoord(pipe=3, data=5, model=2): 118, ProcessCoord(pipe=3, data=5, model=3): 119, ProcessCoord(pipe=3, data=6, model=0): 120, ProcessCoord(pipe=3, data=6, model=1): 121, ProcessCoord(pipe=3, data=6, model=2): 122, ProcessCoord(pipe=3, data=6, model=3): 123, ProcessCoord(pipe=3, data=7, model=0): 124, ProcessCoord(pipe=3, data=7, model=1): 125, ProcessCoord(pipe=3, data=7, model=2): 126, ProcessCoord(pipe=3, data=7, model=3): 127, ProcessCoord(pipe=4, data=0, model=0): 128, ProcessCoord(pipe=4, data=0, model=1): 129, ProcessCoord(pipe=4, data=0, model=2): 130, ProcessCoord(pipe=4, data=0, model=3): 131, ProcessCoord(pipe=4, data=1, model=0): 132, ProcessCoord(pipe=4, data=1, model=1): 133, ProcessCoord(pipe=4, data=1, model=2): 134, ProcessCoord(pipe=4, data=1, model=3): 135, ProcessCoord(pipe=4, data=2, model=0): 136, ProcessCoord(pipe=4, data=2, model=1): 137, ProcessCoord(pipe=4, data=2, model=2): 138, ProcessCoord(pipe=4, data=2, model=3): 139, ProcessCoord(pipe=4, data=3, model=0): 140, ProcessCoord(pipe=4, data=3, model=1): 141, ProcessCoord(pipe=4, data=3, model=2): 142, ProcessCoord(pipe=4, data=3, model=3): 143, ProcessCoord(pipe=4, data=4, model=0): 144, ProcessCoord(pipe=4, data=4, model=1): 145, ProcessCoord(pipe=4, data=4, model=2): 146, ProcessCoord(pipe=4, data=4, model=3): 147, ProcessCoord(pipe=4, data=5, model=0): 148, ProcessCoord(pipe=4, data=5, model=1): 149, ProcessCoord(pipe=4, data=5, model=2): 150, ProcessCoord(pipe=4, data=5, model=3): 151, ProcessCoord(pipe=4, data=6, model=0): 152, ProcessCoord(pipe=4, data=6, model=1): 153, ProcessCoord(pipe=4, data=6, model=2): 154, ProcessCoord(pipe=4, data=6, model=3): 155, ProcessCoord(pipe=4, data=7, model=0): 156, ProcessCoord(pipe=4, data=7, model=1): 157, ProcessCoord(pipe=4, data=7, model=2): 158, ProcessCoord(pipe=4, data=7, model=3): 159, ProcessCoord(pipe=5, data=0, model=0): 160, ProcessCoord(pipe=5, data=0, model=1): 161, ProcessCoord(pipe=5, data=0, model=2): 162, ProcessCoord(pipe=5, data=0, model=3): 163, ProcessCoord(pipe=5, data=1, model=0): 164, ProcessCoord(pipe=5, data=1, model=1): 165, ProcessCoord(pipe=5, data=1, model=2): 166, ProcessCoord(pipe=5, data=1, model=3): 167, ProcessCoord(pipe=5, data=2, model=0): 168, ProcessCoord(pipe=5, data=2, model=1): 169, ProcessCoord(pipe=5, data=2, model=2): 170, ProcessCoord(pipe=5, data=2, model=3): 171, ProcessCoord(pipe=5, data=3, model=0): 172, ProcessCoord(pipe=5, data=3, model=1): 173, ProcessCoord(pipe=5, data=3, model=2): 174, ProcessCoord(pipe=5, data=3, model=3): 175, ProcessCoord(pipe=5, data=4, model=0): 176, ProcessCoord(pipe=5, data=4, model=1): 177, ProcessCoord(pipe=5, data=4, model=2): 178, ProcessCoord(pipe=5, data=4, model=3): 179, ProcessCoord(pipe=5, data=5, model=0): 180, ProcessCoord(pipe=5, data=5, model=1): 181, ProcessCoord(pipe=5, data=5, model=2): 182, ProcessCoord(pipe=5, data=5, model=3): 183, ProcessCoord(pipe=5, data=6, model=0): 184, ProcessCoord(pipe=5, data=6, model=1): 185, ProcessCoord(pipe=5, data=6, model=2): 186, ProcessCoord(pipe=5, data=6, model=3): 187, ProcessCoord(pipe=5, data=7, model=0): 188, ProcessCoord(pipe=5, data=7, model=1): 189, ProcessCoord(pipe=5, data=7, model=2): 190, ProcessCoord(pipe=5, data=7, model=3): 191, ProcessCoord(pipe=6, data=0, model=0): 192, ProcessCoord(pipe=6, data=0, model=1): 193, ProcessCoord(pipe=6, data=0, model=2): 194, ProcessCoord(pipe=6, data=0, model=3): 195, ProcessCoord(pipe=6, data=1, model=0): 196, ProcessCoord(pipe=6, data=1, model=1): 197, ProcessCoord(pipe=6, data=1, model=2): 198, ProcessCoord(pipe=6, data=1, model=3): 199, ProcessCoord(pipe=6, data=2, model=0): 200, ProcessCoord(pipe=6, data=2, model=1): 201, ProcessCoord(pipe=6, data=2, model=2): 202, ProcessCoord(pipe=6, data=2, model=3): 203, ProcessCoord(pipe=6, data=3, model=0): 204, ProcessCoord(pipe=6, data=3, model=1): 205, ProcessCoord(pipe=6, data=3, model=2): 206, ProcessCoord(pipe=6, data=3, model=3): 207, ProcessCoord(pipe=6, data=4, model=0): 208, ProcessCoord(pipe=6, data=4, model=1): 209, ProcessCoord(pipe=6, data=4, model=2): 210, ProcessCoord(pipe=6, data=4, model=3): 211, ProcessCoord(pipe=6, data=5, model=0): 212, ProcessCoord(pipe=6, data=5, model=1): 213, ProcessCoord(pipe=6, data=5, model=2): 214, ProcessCoord(pipe=6, data=5, model=3): 215, ProcessCoord(pipe=6, data=6, model=0): 216, ProcessCoord(pipe=6, data=6, model=1): 217, ProcessCoord(pipe=6, data=6, model=2): 218, ProcessCoord(pipe=6, data=6, model=3): 219, ProcessCoord(pipe=6, data=7, model=0): 220, ProcessCoord(pipe=6, data=7, model=1): 221, ProcessCoord(pipe=6, data=7, model=2): 222, ProcessCoord(pipe=6, data=7, model=3): 223, ProcessCoord(pipe=7, data=0, model=0): 224, ProcessCoord(pipe=7, data=0, model=1): 225, ProcessCoord(pipe=7, data=0, model=2): 226, ProcessCoord(pipe=7, data=0, model=3): 227, ProcessCoord(pipe=7, data=1, model=0): 228, ProcessCoord(pipe=7, data=1, model=1): 229, ProcessCoord(pipe=7, data=1, model=2): 230, ProcessCoord(pipe=7, data=1, model=3): 231, ProcessCoord(pipe=7, data=2, model=0): 232, ProcessCoord(pipe=7, data=2, model=1): 233, ProcessCoord(pipe=7, data=2, model=2): 234, ProcessCoord(pipe=7, data=2, model=3): 235, ProcessCoord(pipe=7, data=3, model=0): 236, ProcessCoord(pipe=7, data=3, model=1): 237, ProcessCoord(pipe=7, data=3, model=2): 238, ProcessCoord(pipe=7, data=3, model=3): 239, ProcessCoord(pipe=7, data=4, model=0): 240, ProcessCoord(pipe=7, data=4, model=1): 241, ProcessCoord(pipe=7, data=4, model=2): 242, ProcessCoord(pipe=7, data=4, model=3): 243, ProcessCoord(pipe=7, data=5, model=0): 244, ProcessCoord(pipe=7, data=5, model=1): 245, ProcessCoord(pipe=7, data=5, model=2): 246, ProcessCoord(pipe=7, data=5, model=3): 247, ProcessCoord(pipe=7, data=6, model=0): 248, ProcessCoord(pipe=7, data=6, model=1): 249, ProcessCoord(pipe=7, data=6, model=2): 250, ProcessCoord(pipe=7, data=6, model=3): 251, ProcessCoord(pipe=7, data=7, model=0): 252, ProcessCoord(pipe=7, data=7, model=1): 253, ProcessCoord(pipe=7, data=7, model=2): 254, ProcessCoord(pipe=7, data=7, model=3): 255}
[2021-09-24 04:01:42,442] [INFO] [module.py:360:_partition_layers] Partitioning pipeline stages with method type:transformer
stage=0 layers=7
     0: _to_float16
     1: EmbeddingPipe
     2: <lambda>
     3: ParallelTransformerLayerPipe
     4: ParallelTransformerLayerPipe
     5: ParallelTransformerLayerPipe
     6: ParallelTransformerLayerPipe
stage=1 layers=4
     7: ParallelTransformerLayerPipe
     8: ParallelTransformerLayerPipe
     9: ParallelTransformerLayerPipe
    10: ParallelTransformerLayerPipe
stage=2 layers=4
    11: ParallelTransformerLayerPipe
    12: ParallelTransformerLayerPipe
    13: ParallelTransformerLayerPipe
    14: ParallelTransformerLayerPipe
stage=3 layers=4
    15: ParallelTransformerLayerPipe
    16: ParallelTransformerLayerPipe
    17: ParallelTransformerLayerPipe
    18: ParallelTransformerLayerPipe
stage=4 layers=4
    19: ParallelTransformerLayerPipe
    20: ParallelTransformerLayerPipe
    21: ParallelTransformerLayerPipe
    22: ParallelTransformerLayerPipe
stage=5 layers=4
    23: ParallelTransformerLayerPipe
    24: ParallelTransformerLayerPipe
    25: ParallelTransformerLayerPipe
    26: ParallelTransformerLayerPipe
stage=6 layers=4
    27: ParallelTransformerLayerPipe
    28: ParallelTransformerLayerPipe
    29: ParallelTransformerLayerPipe
    30: ParallelTransformerLayerPipe
stage=7 layers=8
    31: ParallelTransformerLayerPipe
    32: ParallelTransformerLayerPipe
    33: ParallelTransformerLayerPipe
    34: ParallelTransformerLayerPipe
    35: <lambda>
    36: MixedFusedLayerNorm
    37: EmbeddingPipe
    38: float16_to_fp32
  loss: CrossEntropy
 > number of parameters on (tensor, pipeline) model parallel rank (1, 5): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (0, 2): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (3, 2): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (0, 1): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (2, 1): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (0, 4): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (2, 4): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (3, 4): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (1, 4): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (2, 2): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (0, 6): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (2, 6): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (3, 6): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (1, 6): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (0, 3): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (1, 3): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (2, 3): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (3, 3): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (3, 1): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (1, 1): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (0, 5): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (1, 2): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (3, 5): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (2, 5): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (1, 7): 1986498560
 > number of parameters on (tensor, pipeline) model parallel rank (2, 7): 1986498560
 > number of parameters on (tensor, pipeline) model parallel rank (0, 7): 1986498560
 > number of parameters on (tensor, pipeline) model parallel rank (3, 7): 1986498560
 > number of parameters on (tensor, pipeline) model parallel rank (2, 0): 1986465792
 > number of parameters on (tensor, pipeline) model parallel rank (1, 0): 1986465792
 > number of parameters on (tensor, pipeline) model parallel rank (3, 0): 1986465792
[2021-09-24 04:01:43,676] [INFO] [utils.py:680:see_memory_usage] After Building Model
[2021-09-24 04:01:43,677] [INFO] [utils.py:681:see_memory_usage] MA 3.77 GB         Max_MA 3.79 GB         CA 3.79 GB         Max_CA 4 GB 
[2021-09-24 04:01:43,677] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory:  used = 37.55 GB, percent = 20.1%
 > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 1986465792
setting training iterations to 159576
> learning rate decay style: cosine
DeepSpeed is enabled.
[2021-09-24 04:01:43,733] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.4.2+bc17042, git-hash=bc17042, git-branch=big-science
[2021-09-24 04:01:43,813] [INFO] [engine.py:179:__init__] DeepSpeed Flops Profiler Enabled: False
[2021-09-24 04:01:43,813] [INFO] [engine.py:736:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer
[2021-09-24 04:01:43,813] [INFO] [engine.py:741:_configure_optimizer] Using client Optimizer as basic optimizer
[2021-09-24 04:01:43,813] [INFO] [engine.py:750:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam
[2021-09-24 04:01:43,813] [INFO] [utils.py:43:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type=<class 'apex.optimizers.fused_adam.FusedAdam'>
[2021-09-24 04:01:43,813] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer
[2021-09-24 04:01:43,814] [INFO] [stage2.py:106:__init__] Reduce bucket size 500000000
[2021-09-24 04:01:43,814] [INFO] [stage2.py:107:__init__] Allgather bucket size 500000000
[2021-09-24 04:01:43,814] [INFO] [stage2.py:108:__init__] CPU Offload: False
[2021-09-24 04:01:43,814] [INFO] [stage2.py:109:__init__] Round robin gradient partitioning: False
[2021-09-24 04:01:48,526] [INFO] [stage2.py:419:__init__] optimizer state initialized
[2021-09-24 04:01:48,527] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam
[2021-09-24 04:01:48,527] [INFO] [engine.py:553:_configure_lr_scheduler] DeepSpeed using client LR scheduler
[2021-09-24 04:01:48,527] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = <megatron.learning_rates.AnnealingLR object at 0x149838b59d00>
[2021-09-24 04:01:48,527] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)]
[2021-09-24 04:01:48,527] [INFO] [config.py:900:print] DeepSpeedEngine configuration:
[2021-09-24 04:01:48,527] [INFO] [config.py:904:print]   activation_checkpointing_config  {
    "partition_activations": false, 
    "contiguous_memory_optimization": false, 
    "cpu_checkpointing": false, 
    "number_checkpoints": null, 
    "synchronize_checkpoint_boundary": false, 
    "profile": false
}
[2021-09-24 04:01:48,527] [INFO] [config.py:904:print]   aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}
[2021-09-24 04:01:48,527] [INFO] [config.py:904:print]   allreduce_always_fp32 ........ False
[2021-09-24 04:01:48,527] [INFO] [config.py:904:print]   amp_enabled .................. False
[2021-09-24 04:01:48,527] [INFO] [config.py:904:print]   amp_params ................... False
[2021-09-24 04:01:48,527] [INFO] [config.py:904:print]   checkpoint_tag_validation_enabled  True
[2021-09-24 04:01:48,527] [INFO] [config.py:904:print]   checkpoint_tag_validation_fail  False
[2021-09-24 04:01:48,527] [INFO] [config.py:904:print]   disable_allgather ............ False
[2021-09-24 04:01:48,527] [INFO] [config.py:904:print]   dump_state ................... False
[2021-09-24 04:01:48,527] [INFO] [config.py:904:print]   dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1}
[2021-09-24 04:01:48,527] [INFO] [config.py:904:print]   eigenvalue_enabled ........... False
[2021-09-24 04:01:48,528] [INFO] [config.py:904:print]   eigenvalue_gas_boundary_resolution  1
[2021-09-24 04:01:48,528] [INFO] [config.py:904:print]   eigenvalue_layer_name ........ bert.encoder.layer
[2021-09-24 04:01:48,528] [INFO] [config.py:904:print]   eigenvalue_layer_num ......... 0
[2021-09-24 04:01:48,528] [INFO] [config.py:904:print]   eigenvalue_max_iter .......... 100
[2021-09-24 04:01:48,528] [INFO] [config.py:904:print]   eigenvalue_stability ......... 1e-06
[2021-09-24 04:01:48,528] [INFO] [config.py:904:print]   eigenvalue_tol ............... 0.01
[2021-09-24 04:01:48,528] [INFO] [config.py:904:print]   eigenvalue_verbose ........... False
[2021-09-24 04:01:48,528] [INFO] [config.py:904:print]   elasticity_enabled ........... False
[2021-09-24 04:01:48,528] [INFO] [config.py:904:print]   flops_profiler_config ........ {
    "enabled": false, 
    "profile_step": 1, 
    "module_depth": -1, 
    "top_modules": 1, 
    "detailed": true, 
    "output_file": null
}
[2021-09-24 04:01:48,528] [INFO] [config.py:904:print]   fp16_enabled ................. True
[2021-09-24 04:01:48,528] [INFO] [config.py:904:print]   fp16_mixed_quantize .......... False
[2021-09-24 04:01:48,528] [INFO] [config.py:904:print]   global_rank .................. 0
[2021-09-24 04:01:48,528] [INFO] [config.py:904:print]   gradient_accumulation_steps .. 256
[2021-09-24 04:01:48,528] [INFO] [config.py:904:print]   gradient_clipping ............ 1.0
[2021-09-24 04:01:48,528] [INFO] [config.py:904:print]   gradient_predivide_factor .... 1.0
[2021-09-24 04:01:48,528] [INFO] [config.py:904:print]   initial_dynamic_scale ........ 4096
[2021-09-24 04:01:48,528] [INFO] [config.py:904:print]   loss_scale ................... 0
[2021-09-24 04:01:48,528] [INFO] [config.py:904:print]   memory_breakdown ............. False
[2021-09-24 04:01:48,528] [INFO] [config.py:904:print]   optimizer_legacy_fusion ...... False
[2021-09-24 04:01:48,528] [INFO] [config.py:904:print]   optimizer_name ............... None
[2021-09-24 04:01:48,528] [INFO] [config.py:904:print]   optimizer_params ............. None
[2021-09-24 04:01:48,528] [INFO] [config.py:904:print]   pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0}
[2021-09-24 04:01:48,528] [INFO] [config.py:904:print]   pld_enabled .................. False
[2021-09-24 04:01:48,528] [INFO] [config.py:904:print]   pld_params ................... False
[2021-09-24 04:01:48,528] [INFO] [config.py:904:print]   prescale_gradients ........... False
[2021-09-24 04:01:48,528] [INFO] [config.py:904:print]   quantize_change_rate ......... 0.001
[2021-09-24 04:01:48,528] [INFO] [config.py:904:print]   quantize_groups .............. 1
[2021-09-24 04:01:48,528] [INFO] [config.py:904:print]   quantize_offset .............. 1000
[2021-09-24 04:01:48,528] [INFO] [config.py:904:print]   quantize_period .............. 1000
[2021-09-24 04:01:48,528] [INFO] [config.py:904:print]   quantize_rounding ............ 0
[2021-09-24 04:01:48,528] [INFO] [config.py:904:print]   quantize_start_bits .......... 16
[2021-09-24 04:01:48,528] [INFO] [config.py:904:print]   quantize_target_bits ......... 8
[2021-09-24 04:01:48,528] [INFO] [config.py:904:print]   quantize_training_enabled .... False
[2021-09-24 04:01:48,528] [INFO] [config.py:904:print]   quantize_type ................ 0
[2021-09-24 04:01:48,528] [INFO] [config.py:904:print]   quantize_verbose ............. False
[2021-09-24 04:01:48,528] [INFO] [config.py:904:print]   scheduler_name ............... None
[2021-09-24 04:01:48,528] [INFO] [config.py:904:print]   scheduler_params ............. None
[2021-09-24 04:01:48,529] [INFO] [config.py:904:print]   sparse_attention ............. None
[2021-09-24 04:01:48,529] [INFO] [config.py:904:print]   sparse_gradients_enabled ..... False
[2021-09-24 04:01:48,529] [INFO] [config.py:904:print]   steps_per_print .............. 2000
[2021-09-24 04:01:48,529] [INFO] [config.py:904:print]   tensorboard_enabled .......... False
[2021-09-24 04:01:48,529] [INFO] [config.py:904:print]   tensorboard_job_name ......... DeepSpeedJobName
[2021-09-24 04:01:48,529] [INFO] [config.py:904:print]   tensorboard_output_path ...... 
[2021-09-24 04:01:48,529] [INFO] [config.py:904:print]   train_batch_size ............. 2048
[2021-09-24 04:01:48,529] [INFO] [config.py:904:print]   train_micro_batch_size_per_gpu  1
[2021-09-24 04:01:48,529] [INFO] [config.py:904:print]   use_quantizer_kernel ......... False
[2021-09-24 04:01:48,529] [INFO] [config.py:904:print]   wall_clock_breakdown ......... False
[2021-09-24 04:01:48,529] [INFO] [config.py:904:print]   world_size ................... 8
[2021-09-24 04:01:48,529] [INFO] [config.py:904:print]   zero_allow_untested_optimizer  False
[2021-09-24 04:01:48,529] [INFO] [config.py:904:print]   zero_config .................. {
    "stage": 1, 
    "contiguous_gradients": false, 
    "reduce_scatter": true, 
    "reduce_bucket_size": 5.000000e+08, 
    "allgather_partitions": true, 
    "allgather_bucket_size": 5.000000e+08, 
    "overlap_comm": false, 
    "load_from_fp32_weights": true, 
    "elastic_checkpoint": true, 
    "offload_param": null, 
    "offload_optimizer": null, 
    "sub_group_size": 1.000000e+09, 
    "prefetch_bucket_size": 5.000000e+07, 
    "param_persistence_threshold": 1.000000e+05, 
    "max_live_parameters": 1.000000e+09, 
    "max_reuse_distance": 1.000000e+09, 
    "gather_fp16_weights_on_model_save": false, 
    "ignore_unused_parameters": true, 
    "round_robin_gradients": false, 
    "legacy_stage1": false
}
[2021-09-24 04:01:48,529] [INFO] [config.py:904:print]   zero_enabled ................. True
[2021-09-24 04:01:48,529] [INFO] [config.py:904:print]   zero_optimization_stage ...... 1
[2021-09-24 04:01:48,529] [INFO] [config.py:906:print]   json = {
    "train_micro_batch_size_per_gpu": 1, 
    "train_batch_size": 2.048000e+03, 
    "gradient_clipping": 1.0, 
    "zero_optimization": {
        "stage": 1
    }, 
    "fp16": {
        "enabled": true, 
        "loss_scale": 0, 
        "loss_scale_window": 500, 
        "hysteresis": 2, 
        "min_loss_scale": 1, 
        "initial_scale_power": 12
    }, 
    "steps_per_print": 2.000000e+03, 
    "wall_clock_breakdown": false
}
[2021-09-24 04:01:48,529] [INFO] [engine.py:76:__init__] CONFIG: micro_batches=256 micro_batch_size=1
[2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=0 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=2 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=3 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=1 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=67 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=64 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=66 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=130 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=129 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=131 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=128 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=193 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=194 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=195 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=65 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=226 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=225 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=227 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=224 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=99 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=96 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=97 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=35 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=33 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=32 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=34 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=163 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=161 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=160 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=162 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=192 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=98 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
 > using checkpoint value 6e-05 for learning rate
 > using checkpoint value 6e-06 for minimum learning rate
 > using checkpoint value 216320 for warmup iterations
 > using checkpoint value 126953125 for total number of iterations
 > using checkpoint value cosine for decay style
successfully loaded 8 ZeRO state_dicts for rank 124
successfully loaded 8 ZeRO state_dicts for rank 115
successfully loaded 8 ZeRO state_dicts for rank 60
successfully loaded 8 ZeRO state_dicts for rank 48
successfully loaded 8 ZeRO state_dicts for rank 61
successfully loaded 8 ZeRO state_dicts for rank 125
successfully loaded 8 ZeRO state_dicts for rank 126
successfully loaded 8 ZeRO state_dicts for rank 127
successfully loaded 8 ZeRO state_dicts for rank 160
successfully loaded 8 ZeRO state_dicts for rank 135
successfully loaded 8 ZeRO state_dicts for rank 68
successfully loaded 8 ZeRO state_dicts for rank 113
successfully loaded 8 ZeRO state_dicts for rank 108
successfully loaded 8 ZeRO state_dicts for rank 27
successfully loaded 8 ZeRO state_dicts for rank 72
successfully loaded 8 ZeRO state_dicts for rank 49
successfully loaded 8 ZeRO state_dicts for rank 71
successfully loaded 8 ZeRO state_dicts for rank 147
successfully loaded 8 ZeRO state_dicts for rank 96
successfully loaded 8 ZeRO state_dicts for rank 32
successfully loaded 8 ZeRO state_dicts for rank 214
successfully loaded 8 ZeRO state_dicts for rank 143
successfully loaded 8 ZeRO state_dicts for rank 158
successfully loaded 8 ZeRO state_dicts for rank 132
successfully loaded 8 ZeRO state_dicts for rank 111
successfully loaded 8 ZeRO state_dicts for rank 155
successfully loaded 8 ZeRO state_dicts for rank 112
successfully loaded 8 ZeRO state_dicts for rank 76
successfully loaded 8 ZeRO state_dicts for rank 63
successfully loaded 8 ZeRO state_dicts for rank 44
successfully loaded 8 ZeRO state_dicts for rank 201
successfully loaded 8 ZeRO state_dicts for rank 213
successfully loaded 8 ZeRO state_dicts for rank 162
successfully loaded 8 ZeRO state_dicts for rank 97
successfully loaded 8 ZeRO state_dicts for rank 51
successfully loaded 8 ZeRO state_dicts for rank 133
loading 8 zero partition checkpoints for rank 124
successfully loaded 8 ZeRO state_dicts for rank 114
successfully loaded 8 ZeRO state_dicts for rank 33
successfully loaded 8 ZeRO state_dicts for rank 140
successfully loaded 8 ZeRO state_dicts for rank 181
successfully loaded 8 ZeRO state_dicts for rank 41
successfully loaded 8 ZeRO state_dicts for rank 185
successfully loaded 8 ZeRO state_dicts for rank 241
successfully loaded 8 ZeRO state_dicts for rank 134
successfully loaded 8 ZeRO state_dicts for rank 39
successfully loaded 8 ZeRO state_dicts for rank 24
successfully loaded 8 ZeRO state_dicts for rank 212
successfully loaded 8 ZeRO state_dicts for rank 104
successfully loaded 8 ZeRO state_dicts for rank 142
successfully loaded 8 ZeRO state_dicts for rank 154
successfully loaded 8 ZeRO state_dicts for rank 159
successfully loaded 8 ZeRO state_dicts for rank 166
successfully loaded 8 ZeRO state_dicts for rank 148
successfully loaded 8 ZeRO state_dicts for rank 35
successfully loaded 8 ZeRO state_dicts for rank 70
successfully loaded 8 ZeRO state_dicts for rank 75
WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-24 04:02:16 CEST)" was missed by 0:00:03.600668
successfully loaded 8 ZeRO state_dicts for rank 156
successfully loaded 8 ZeRO state_dicts for rank 161
successfully loaded 8 ZeRO state_dicts for rank 243
successfully loaded 8 ZeRO state_dicts for rank 40
successfully loaded 8 ZeRO state_dicts for rank 141
successfully loaded 8 ZeRO state_dicts for rank 98
successfully loaded 8 ZeRO state_dicts for rank 210
successfully loaded 8 ZeRO state_dicts for rank 52
successfully loaded 8 ZeRO state_dicts for rank 28
successfully loaded 8 ZeRO state_dicts for rank 110
successfully loaded 8 ZeRO state_dicts for rank 139
successfully loaded 8 ZeRO state_dicts for rank 36
successfully loaded 8 ZeRO state_dicts for rank 168
successfully loaded 8 ZeRO state_dicts for rank 26
successfully loaded 8 ZeRO state_dicts for rank 84
successfully loaded 8 ZeRO state_dicts for rank 208
successfully loaded 8 ZeRO state_dicts for rank 190
successfully loaded 8 ZeRO state_dicts for rank 92
loading 8 zero partition checkpoints for rank 115
successfully loaded 8 ZeRO state_dicts for rank 34
successfully loaded 8 ZeRO state_dicts for rank 171
successfully loaded 8 ZeRO state_dicts for rank 152
successfully loaded 8 ZeRO state_dicts for rank 73
successfully loaded 8 ZeRO state_dicts for rank 47
successfully loaded 8 ZeRO state_dicts for rank 62
successfully loaded 8 ZeRO state_dicts for rank 150
successfully loaded 8 ZeRO state_dicts for rank 69
successfully loaded 8 ZeRO state_dicts for rank 157
successfully loaded 8 ZeRO state_dicts for rank 182
successfully loaded 8 ZeRO state_dicts for rank 145
successfully loaded 8 ZeRO state_dicts for rank 79
successfully loaded 8 ZeRO state_dicts for rank 88
successfully loaded 8 ZeRO state_dicts for rank 109
successfully loaded 8 ZeRO state_dicts for rank 56
successfully loaded 8 ZeRO state_dicts for rank 149
successfully loaded 8 ZeRO state_dicts for rank 50
successfully loaded 8 ZeRO state_dicts for rank 42
successfully loaded 8 ZeRO state_dicts for rank 206
successfully loaded 8 ZeRO state_dicts for rank 196
successfully loaded 8 ZeRO state_dicts for rank 80
successfully loaded 8 ZeRO state_dicts for rank 215
successfully loaded 8 ZeRO state_dicts for rank 74
successfully loaded 8 ZeRO state_dicts for rank 43
successfully loaded 8 ZeRO state_dicts for rank 99
successfully loaded 8 ZeRO state_dicts for rank 192
successfully loaded 8 ZeRO state_dicts for rank 78
successfully loaded 8 ZeRO state_dicts for rank 37
successfully loaded 8 ZeRO state_dicts for rank 216
successfully loaded 8 ZeRO state_dicts for rank 153
successfully loaded 8 ZeRO state_dicts for rank 77
loading 8 zero partition checkpoints for rank 126
loading 8 zero partition checkpoints for rank 125
successfully loaded 8 ZeRO state_dicts for rank 193
successfully loaded 8 ZeRO state_dicts for rank 151
successfully loaded 8 ZeRO state_dicts for rank 59
successfully loaded 8 ZeRO state_dicts for rank 180
successfully loaded 8 ZeRO state_dicts for rank 220
successfully loaded 8 ZeRO state_dicts for rank 100
successfully loaded 8 ZeRO state_dicts for rank 107
successfully loaded 8 ZeRO state_dicts for rank 90
successfully loaded 8 ZeRO state_dicts for rank 130
successfully loaded 8 ZeRO state_dicts for rank 163
successfully loaded 8 ZeRO state_dicts for rank 164
successfully loaded 8 ZeRO state_dicts for rank 205
successfully loaded 8 ZeRO state_dicts for rank 94
successfully loaded 8 ZeRO state_dicts for rank 144
successfully loaded 8 ZeRO state_dicts for rank 225
successfully loaded 8 ZeRO state_dicts for rank 25
successfully loaded 8 ZeRO state_dicts for rank 217
successfully loaded 8 ZeRO state_dicts for rank 184
successfully loaded 8 ZeRO state_dicts for rank 172
successfully loaded 8 ZeRO state_dicts for rank 128
successfully loaded 8 ZeRO state_dicts for rank 15
successfully loaded 8 ZeRO state_dicts for rank 131
successfully loaded 8 ZeRO state_dicts for rank 46
successfully loaded 8 ZeRO state_dicts for rank 170
successfully loaded 8 ZeRO state_dicts for rank 198
successfully loaded 8 ZeRO state_dicts for rank 58
successfully loaded 8 ZeRO state_dicts for rank 248
successfully loaded 8 ZeRO state_dicts for rank 13
loading 8 zero partition checkpoints for rank 127
successfully loaded 8 ZeRO state_dicts for rank 183
successfully loaded 8 ZeRO state_dicts for rank 64
successfully loaded 8 ZeRO state_dicts for rank 105
successfully loaded 8 ZeRO state_dicts for rank 55
successfully loaded 8 ZeRO state_dicts for rank 66
successfully loaded 8 ZeRO state_dicts for rank 14
successfully loaded 8 ZeRO state_dicts for rank 240
successfully loaded 8 ZeRO state_dicts for rank 81
successfully loaded 8 ZeRO state_dicts for rank 186
successfully loaded 8 ZeRO state_dicts for rank 65
successfully loaded 8 ZeRO state_dicts for rank 146
successfully loaded 8 ZeRO state_dicts for rank 93
successfully loaded 8 ZeRO state_dicts for rank 200
successfully loaded 8 ZeRO state_dicts for rank 138
successfully loaded 8 ZeRO state_dicts for rank 211
successfully loaded 8 ZeRO state_dicts for rank 45
successfully loaded 8 ZeRO state_dicts for rank 38
successfully loaded 8 ZeRO state_dicts for rank 229
successfully loaded 8 ZeRO state_dicts for rank 129
successfully loaded 8 ZeRO state_dicts for rank 31
successfully loaded 8 ZeRO state_dicts for rank 197
successfully loaded 8 ZeRO state_dicts for rank 177
successfully loaded 8 ZeRO state_dicts for rank 116
successfully loaded 8 ZeRO state_dicts for rank 89
successfully loaded 8 ZeRO state_dicts for rank 117
WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-24 04:02:20 CEST)" was missed by 0:00:03.124446
successfully loaded 8 ZeRO state_dicts for rank 23
successfully loaded 8 ZeRO state_dicts for rank 188
successfully loaded 8 ZeRO state_dicts for rank 137
successfully loaded 8 ZeRO state_dicts for rank 4
successfully loaded 8 ZeRO state_dicts for rank 167
successfully loaded 8 ZeRO state_dicts for rank 236
loading 8 zero partition checkpoints for rank 61
successfully loaded 8 ZeRO state_dicts for rank 207
successfully loaded 8 ZeRO state_dicts for rank 203
successfully loaded 8 ZeRO state_dicts for rank 176
successfully loaded 8 ZeRO state_dicts for rank 174
successfully loaded 8 ZeRO state_dicts for rank 202
successfully loaded 8 ZeRO state_dicts for rank 82
successfully loaded 8 ZeRO state_dicts for rank 169
loading 8 zero partition checkpoints for rank 48
successfully loaded 8 ZeRO state_dicts for rank 209
successfully loaded 8 ZeRO state_dicts for rank 106
successfully loaded 8 ZeRO state_dicts for rank 195
successfully loaded 8 ZeRO state_dicts for rank 136
successfully loaded 8 ZeRO state_dicts for rank 8
successfully loaded 8 ZeRO state_dicts for rank 178
successfully loaded 8 ZeRO state_dicts for rank 219
successfully loaded 8 ZeRO state_dicts for rank 204
successfully loaded 8 ZeRO state_dicts for rank 53
successfully loaded 8 ZeRO state_dicts for rank 235
successfully loaded 8 ZeRO state_dicts for rank 191
loading 8 zero partition checkpoints for rank 60
successfully loaded 8 ZeRO state_dicts for rank 227
successfully loaded 8 ZeRO state_dicts for rank 120
successfully loaded 8 ZeRO state_dicts for rank 175
successfully loaded 8 ZeRO state_dicts for rank 250
successfully loaded 8 ZeRO state_dicts for rank 189
successfully loaded 8 ZeRO state_dicts for rank 6
successfully loaded 8 ZeRO state_dicts for rank 237
successfully loaded 8 ZeRO state_dicts for rank 118
successfully loaded 8 ZeRO state_dicts for rank 119
loading 8 zero partition checkpoints for rank 68
successfully loaded 8 ZeRO state_dicts for rank 22
successfully loaded 8 ZeRO state_dicts for rank 91
successfully loaded 8 ZeRO state_dicts for rank 86
successfully loaded 8 ZeRO state_dicts for rank 83
successfully loaded 8 ZeRO state_dicts for rank 87
successfully loaded 8 ZeRO state_dicts for rank 121
successfully loaded 8 ZeRO state_dicts for rank 218
successfully loaded 8 ZeRO state_dicts for rank 221
loading 8 zero partition checkpoints for rank 113
successfully loaded 8 ZeRO state_dicts for rank 9
successfully loaded 8 ZeRO state_dicts for rank 222
successfully loaded 8 ZeRO state_dicts for rank 251
loading 8 zero partition checkpoints for rank 72
successfully loaded 8 ZeRO state_dicts for rank 179
successfully loaded 8 ZeRO state_dicts for rank 247
successfully loaded 8 ZeRO state_dicts for rank 12
successfully loaded 8 ZeRO state_dicts for rank 29
successfully loaded 8 ZeRO state_dicts for rank 95
successfully loaded 8 ZeRO state_dicts for rank 231
successfully loaded 8 ZeRO state_dicts for rank 239
successfully loaded 8 ZeRO state_dicts for rank 245
loading 8 zero partition checkpoints for rank 32
successfully loaded 8 ZeRO state_dicts for rank 255
successfully loaded 8 ZeRO state_dicts for rank 232
successfully loaded 8 ZeRO state_dicts for rank 238
successfully loaded 8 ZeRO state_dicts for rank 7
successfully loaded 8 ZeRO state_dicts for rank 228
successfully loaded 8 ZeRO state_dicts for rank 67
successfully loaded 8 ZeRO state_dicts for rank 252
successfully loaded 8 ZeRO state_dicts for rank 187
successfully loaded 8 ZeRO state_dicts for rank 230
successfully loaded 8 ZeRO state_dicts for rank 244
successfully loaded 8 ZeRO state_dicts for rank 194
loading 8 zero partition checkpoints for rank 112
loading 8 zero partition checkpoints for rank 135
successfully loaded 8 ZeRO state_dicts for rank 5
successfully loaded 8 ZeRO state_dicts for rank 103
loading 8 zero partition checkpoints for rank 111
successfully loaded 8 ZeRO state_dicts for rank 21
loading 8 zero partition checkpoints for rank 63
successfully loaded 8 ZeRO state_dicts for rank 165
successfully loaded 8 ZeRO state_dicts for rank 54
successfully loaded 8 ZeRO state_dicts for rank 102
successfully loaded 8 ZeRO state_dicts for rank 233
successfully loaded 8 ZeRO state_dicts for rank 85
successfully loaded 8 ZeRO state_dicts for rank 223
successfully loaded 8 ZeRO state_dicts for rank 11
successfully loaded 8 ZeRO state_dicts for rank 226
successfully loaded 8 ZeRO state_dicts for rank 101
loading 8 zero partition checkpoints for rank 160
loading 8 zero partition checkpoints for rank 143
loading 8 zero partition checkpoints for rank 155
successfully loaded 8 ZeRO state_dicts for rank 199
successfully loaded 8 ZeRO state_dicts for rank 1
successfully loaded 8 ZeRO state_dicts for rank 173
successfully loaded 8 ZeRO state_dicts for rank 20
loading 8 zero partition checkpoints for rank 162
loading 8 zero partition checkpoints for rank 76
successfully loaded 8 ZeRO state_dicts for rank 246
successfully loaded 8 ZeRO state_dicts for rank 242
successfully loaded 8 ZeRO state_dicts for rank 254
successfully loaded 8 ZeRO state_dicts for rank 0
successfully loaded 8 ZeRO state_dicts for rank 253
successfully loaded 8 ZeRO state_dicts for rank 2
loading 8 zero partition checkpoints for rank 27
loading 8 zero partition checkpoints for rank 201
loading 8 zero partition checkpoints for rank 33
successfully loaded 8 ZeRO state_dicts for rank 224
loading 8 zero partition checkpoints for rank 185
loading 8 zero partition checkpoints for rank 212
successfully loaded 8 ZeRO state_dicts for rank 122
loading 8 zero partition checkpoints for rank 214
loading 8 zero partition checkpoints for rank 181
loading 8 zero partition checkpoints for rank 114
loading 8 zero partition checkpoints for rank 39
loading 8 zero partition checkpoints for rank 154
successfully loaded 8 ZeRO state_dicts for rank 10
loading 8 zero partition checkpoints for rank 132
successfully loaded 8 ZeRO state_dicts for rank 249
loading 8 zero partition checkpoints for rank 147
successfully loaded 8 ZeRO state_dicts for rank 123
successfully loaded 8 ZeRO state_dicts for rank 57
loading 8 zero partition checkpoints for rank 213
loading 8 zero partition checkpoints for rank 133
loading 8 zero partition checkpoints for rank 35
loading 8 zero partition checkpoints for rank 41
loading 8 zero partition checkpoints for rank 156
successfully loaded 8 ZeRO state_dicts for rank 3
loading 8 zero partition checkpoints for rank 75
loading 8 zero partition checkpoints for rank 148
loading 8 zero partition checkpoints for rank 104
loading 8 zero partition checkpoints for rank 142
successfully loaded 8 ZeRO state_dicts for rank 234
loading 8 zero partition checkpoints for rank 210
loading 8 zero partition checkpoints for rank 52
loading 8 zero partition checkpoints for rank 134
loading 8 zero partition checkpoints for rank 70
loading 8 zero partition checkpoints for rank 139
successfully loaded 8 ZeRO state_dicts for rank 30
loading 8 zero partition checkpoints for rank 161
loading 8 zero partition checkpoints for rank 190
loading 8 zero partition checkpoints for rank 51
loading 8 zero partition checkpoints for rank 168
loading 8 zero partition checkpoints for rank 158
loading 8 zero partition checkpoints for rank 208
loading 8 zero partition checkpoints for rank 97
loading 8 zero partition checkpoints for rank 73
loading 8 zero partition checkpoints for rank 152
loading 8 zero partition checkpoints for rank 34
loading 8 zero partition checkpoints for rank 79
loading 8 zero partition checkpoints for rank 108
loading 8 zero partition checkpoints for rank 241
loading 8 zero partition checkpoints for rank 26
loading 8 zero partition checkpoints for rank 88
loading 8 zero partition checkpoints for rank 109
loading 8 zero partition checkpoints for rank 157
loading 8 zero partition checkpoints for rank 40
loading 8 zero partition checkpoints for rank 28
loading 8 zero partition checkpoints for rank 36
loading 8 zero partition checkpoints for rank 215
loading 8 zero partition checkpoints for rank 43
loading 8 zero partition checkpoints for rank 80
loading 8 zero partition checkpoints for rank 47
loading 8 zero partition checkpoints for rank 192
loading 8 zero partition checkpoints for rank 78
loading 8 zero partition checkpoints for rank 150
loading 8 zero partition checkpoints for rank 153
loading 8 zero partition checkpoints for rank 171
loading 8 zero partition checkpoints for rank 182
loading 8 zero partition checkpoints for rank 151
loading 8 zero partition checkpoints for rank 140
loading 8 zero partition checkpoints for rank 159
loading 8 zero partition checkpoints for rank 149
loading 8 zero partition checkpoints for rank 74
loading 8 zero partition checkpoints for rank 77
loading 8 zero partition checkpoints for rank 71
loading 8 zero partition checkpoints for rank 141
loading 8 zero partition checkpoints for rank 98
loading 8 zero partition checkpoints for rank 128
loading 8 zero partition checkpoints for rank 206
loading 8 zero partition checkpoints for rank 164
loading 8 zero partition checkpoints for rank 144
loading 8 zero partition checkpoints for rank 62
loading 8 zero partition checkpoints for rank 198
loading 8 zero partition checkpoints for rank 170
loading 8 zero partition checkpoints for rank 180
loading 8 zero partition checkpoints for rank 130
loading 8 zero partition checkpoints for rank 216
loading 8 zero partition checkpoints for rank 100
loading 8 zero partition checkpoints for rank 183
loading 8 zero partition checkpoints for rank 38
loading 8 zero partition checkpoints for rank 205
loading 8 zero partition checkpoints for rank 163
loading 8 zero partition checkpoints for rank 138
loading 8 zero partition checkpoints for rank 184
loading 8 zero partition checkpoints for rank 64
loading 8 zero partition checkpoints for rank 145
loading 8 zero partition checkpoints for rank 211
loading 8 zero partition checkpoints for rank 186
loading 8 zero partition checkpoints for rank 217
loading 8 zero partition checkpoints for rank 81
loading 8 zero partition checkpoints for rank 146
loading 8 zero partition checkpoints for rank 96
loading 8 zero partition checkpoints for rank 137
loading 8 zero partition checkpoints for rank 42
loading 8 zero partition checkpoints for rank 37
loading 8 zero partition checkpoints for rank 44
loading 8 zero partition checkpoints for rank 203
loading 8 zero partition checkpoints for rank 89
loading 8 zero partition checkpoints for rank 69
loading 8 zero partition checkpoints for rank 167
loading 8 zero partition checkpoints for rank 225
loading 8 zero partition checkpoints for rank 219
loading 8 zero partition checkpoints for rank 117
loading 8 zero partition checkpoints for rank 136
loading 8 zero partition checkpoints for rank 209
loading 8 zero partition checkpoints for rank 65
loading 8 zero partition checkpoints for rank 45
loading 8 zero partition checkpoints for rank 202
loading 8 zero partition checkpoints for rank 166
loading 8 zero partition checkpoints for rank 106
loading 8 zero partition checkpoints for rank 13
loading 8 zero partition checkpoints for rank 196
loading 8 zero partition checkpoints for rank 178
loading 8 zero partition checkpoints for rank 107
loading 8 zero partition checkpoints for rank 200
loading 8 zero partition checkpoints for rank 189
loading 8 zero partition checkpoints for rank 92
loading 8 zero partition checkpoints for rank 110
loading 8 zero partition checkpoints for rank 82
loading 8 zero partition checkpoints for rank 86
loading 8 zero partition checkpoints for rank 4
loading 8 zero partition checkpoints for rank 240
loading 8 zero partition checkpoints for rank 83
loading 8 zero partition checkpoints for rank 56
loading 8 zero partition checkpoints for rank 118
loading 8 zero partition checkpoints for rank 176
loading 8 zero partition checkpoints for rank 105
loading 8 zero partition checkpoints for rank 177
loading 8 zero partition checkpoints for rank 221
loading 8 zero partition checkpoints for rank 222
loading 8 zero partition checkpoints for rank 218
loading 8 zero partition checkpoints for rank 49
loading 8 zero partition checkpoints for rank 169
loading 8 zero partition checkpoints for rank 194
loading 8 zero partition checkpoints for rank 54
loading 8 zero partition checkpoints for rank 250
loading 8 zero partition checkpoints for rank 103
loading 8 zero partition checkpoints for rank 199
loading 8 zero partition checkpoints for rank 187
loading 8 zero partition checkpoints for rank 12
loading 8 zero partition checkpoints for rank 179
loading 8 zero partition checkpoints for rank 29
loading 8 zero partition checkpoints for rank 55
loading 8 zero partition checkpoints for rank 197
loading 8 zero partition checkpoints for rank 24
loading 8 zero partition checkpoints for rank 85
loading 8 zero partition checkpoints for rank 58
loading 8 zero partition checkpoints for rank 22
loading 8 zero partition checkpoints for rank 131
loading 8 zero partition checkpoints for rank 229
loading 8 zero partition checkpoints for rank 99
loading 8 zero partition checkpoints for rank 90
loading 8 zero partition checkpoints for rank 232
loading 8 zero partition checkpoints for rank 193
loading 8 zero partition checkpoints for rank 239
loading 8 zero partition checkpoints for rank 23
loading 8 zero partition checkpoints for rank 94
loading 8 zero partition checkpoints for rank 236
loading 8 zero partition checkpoints for rank 129
loading 8 zero partition checkpoints for rank 251
loading 8 zero partition checkpoints for rank 46
loading 8 zero partition checkpoints for rank 21
loading 8 zero partition checkpoints for rank 252
loading 8 zero partition checkpoints for rank 238
loading 8 zero partition checkpoints for rank 7
loading 8 zero partition checkpoints for rank 53
loading 8 zero partition checkpoints for rank 84
loading 8 zero partition checkpoints for rank 254
loading 8 zero partition checkpoints for rank 6
loading 8 zero partition checkpoints for rank 245
loading 8 zero partition checkpoints for rank 246
loading 8 zero partition checkpoints for rank 243
loading 8 zero partition checkpoints for rank 233
loading 8 zero partition checkpoints for rank 1
loading 8 zero partition checkpoints for rank 50
loading 8 zero partition checkpoints for rank 220
loading 8 zero partition checkpoints for rank 195
loading 8 zero partition checkpoints for rank 237
loading 8 zero partition checkpoints for rank 165
loading 8 zero partition checkpoints for rank 230
loading 8 zero partition checkpoints for rank 224
loading 8 zero partition checkpoints for rank 207
loading 8 zero partition checkpoints for rank 2
loading 8 zero partition checkpoints for rank 66
loading 8 zero partition checkpoints for rank 204
loading 8 zero partition checkpoints for rank 59
loading 8 zero partition checkpoints for rank 25
loading 8 zero partition checkpoints for rank 5
loading 8 zero partition checkpoints for rank 228
loading 8 zero partition checkpoints for rank 91
loading 8 zero partition checkpoints for rank 231
loading 8 zero partition checkpoints for rank 116
loading 8 zero partition checkpoints for rank 102
loading 8 zero partition checkpoints for rank 20
loading 8 zero partition checkpoints for rank 119
loading 8 zero partition checkpoints for rank 101
loading 8 zero partition checkpoints for rank 67
loading 8 zero partition checkpoints for rank 93
loading 8 zero partition checkpoints for rank 242
loading 8 zero partition checkpoints for rank 188
loading 8 zero partition checkpoints for rank 87
loading 8 zero partition checkpoints for rank 247
loading 8 zero partition checkpoints for rank 0
loading 8 zero partition checkpoints for rank 244
 checkpoint version 3.0
loading 8 zero partition checkpoints for rank 223
loading 8 zero partition checkpoints for rank 191
loading 8 zero partition checkpoints for rank 31
loading 8 zero partition checkpoints for rank 57
loading 8 zero partition checkpoints for rank 95
loading 8 zero partition checkpoints for rank 15
loading 8 zero partition checkpoints for rank 248
loading 8 zero partition checkpoints for rank 120
loading 8 zero partition checkpoints for rank 14
loading 8 zero partition checkpoints for rank 235
loading 8 zero partition checkpoints for rank 3
loading 8 zero partition checkpoints for rank 121
loading 8 zero partition checkpoints for rank 255
loading 8 zero partition checkpoints for rank 172
loading 8 zero partition checkpoints for rank 253
loading 8 zero partition checkpoints for rank 227
loading 8 zero partition checkpoints for rank 249
loading 8 zero partition checkpoints for rank 30
loading 8 zero partition checkpoints for rank 174
loading 8 zero partition checkpoints for rank 226
loading 8 zero partition checkpoints for rank 234
loading 8 zero partition checkpoints for rank 175
loading 8 zero partition checkpoints for rank 173
loading 8 zero partition checkpoints for rank 122
loading 8 zero partition checkpoints for rank 123
loading 8 zero partition checkpoints for rank 8
loading 8 zero partition checkpoints for rank 9
loading 8 zero partition checkpoints for rank 11
loading 8 zero partition checkpoints for rank 10
successfully loaded 8 ZeRO state_dicts for rank 18
successfully loaded 8 ZeRO state_dicts for rank 16
successfully loaded 8 ZeRO state_dicts for rank 17
loading 8 zero partition checkpoints for rank 18
successfully loaded 8 ZeRO state_dicts for rank 19
loading 8 zero partition checkpoints for rank 16
loading 8 zero partition checkpoints for rank 17
loading 8 zero partition checkpoints for rank 19
  successfully loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints at iteration 474
time (ms) | load-checkpoint: 86577.34
[after model, optimizer, and learning rate scheduler are built] datetime: 2021-09-24 04:03:15 
> building train, validation, and test datasets ...
 > datasets target sizes (minimum size):
    train:      300000000
    validation: 1638400
    test:       10240
> building train, validation, and test datasets for GPT ...
 > building dataset index ...
    reading sizes...
    reading pointers...
    reading document index...
    creating numpy buffer of mmap...
    creating memory view of numpy buffer...
 > finished creating indexed dataset in 0.164226 seconds
    number of documents: 304230423
 > dataset split:
    train:
     document indices in [0, 288714672) total of 288714672 documents
    validation:
     document indices in [288714672, 303926193) total of 15211521 documents
    test:
     document indices in [303926193, 304230423) total of 304230 documents
 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_300000000ns_2048sl_42s_doc_idx.npy
 > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_300000000ns_2048sl_42s_sample_idx.npy
 > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_300000000ns_2048sl_42s_shuffle_idx.npy
    loaded indexed file in 0.365 seconds
    total number of samples: 394611670
    total number of epochs: 3
 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_1638400ns_2048sl_42s_doc_idx.npy
 > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_1638400ns_2048sl_42s_sample_idx.npy
 > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_1638400ns_2048sl_42s_shuffle_idx.npy
    loaded indexed file in 0.203 seconds
    total number of samples: 6927161
    total number of epochs: 1
 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_42s_doc_idx.npy
 > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_42s_sample_idx.npy
 > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_42s_shuffle_idx.npy
    loaded indexed file in 0.072 seconds
    total number of samples: 137384
    total number of epochs: 1
> finished creating GPT datasets ...
[after dataloaders are built] datetime: 2021-09-24 04:03:22 
done with setup ...
training ...
time (ms) | model-and-optimizer-setup: 94922.27 | train/valid/test-data-iterators-setup: 5644.20
[before the start of training step] datetime: 2021-09-24 04:03:22 
[2021-09-24 04:03:22,280] [INFO] [checkpointing.py:408:forward] Activation Checkpointing Information
[2021-09-24 04:03:22,280] [INFO] [checkpointing.py:409:forward] ----Partition Activations False, CPU CHECKPOINTING False
[2021-09-24 04:03:22,281] [INFO] [checkpointing.py:412:forward] ----contiguous Memory Checkpointing False with 32 total layers
[2021-09-24 04:03:22,281] [INFO] [checkpointing.py:415:forward] ----Synchronization False
[2021-09-24 04:03:22,281] [INFO] [checkpointing.py:416:forward] ----Profiling time in checkpointing False
[2021-09-24 04:03:47] PULSE: tr8-104B is waiting to be scheduled (1159457_[1-10%1] on 'gpu_p13' partition)
[2021-09-24 04:03:47] PULSE: tr8-104B is scheduled to start in 18:10:24 (at 2021-09-24T22:14:12) (1161605 on 'gpu_p13' partition)
[2021-09-24 04:03:47] PULSE: tr8-104B is running for 2:42 since 2021-09-24T04:01:05 (1162747 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8])
[Rank 33] (after 475 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0
[Rank 65] (after 475 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18826.0 | max reserved: 18826.0
[Rank 1] (after 475 iterations) memory (MB) | allocated: 6661.611328125 | max allocated: 11742.55810546875 | reserved: 21150.0 | max reserved: 21150.0
[Rank 225] (after 475 iterations) memory (MB) | allocated: 7107.70751953125 | max allocated: 11884.6845703125 | reserved: 22108.0 | max reserved: 22108.0
[Rank 97] (after 475 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0
[Rank 129] (after 475 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18586.0 | max reserved: 18586.0
[Rank 193] (after 475 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18778.0 | max reserved: 18778.0
[Rank 161] (after 475 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0
[Rank 2] (after 475 iterations) memory (MB) | allocated: 6661.611328125 | max allocated: 11742.55810546875 | reserved: 22878.0 | max reserved: 22878.0
[Rank 226] (after 475 iterations) memory (MB) | allocated: 7107.70751953125 | max allocated: 11884.6845703125 | reserved: 20752.0 | max reserved: 20752.0
[Rank 34] (after 475 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18586.0 | max reserved: 18586.0
[Rank 66] (after 475 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18586.0 | max reserved: 18586.0
[Rank 98] (after 475 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18522.0 | max reserved: 18522.0
[Rank 130] (after 475 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18586.0 | max reserved: 18586.0
[Rank 194] (after 475 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0
[Rank 162] (after 475 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0
[Rank 0] (after 475 iterations) memory (MB) | allocated: 6661.611328125 | max allocated: 11742.55810546875 | reserved: 23514.0 | max reserved: 23514.0
[Rank 224] (after 475 iterations) memory (MB) | allocated: 7107.70751953125 | max allocated: 11884.6845703125 | reserved: 22108.0 | max reserved: 22108.0
[Rank 32] (after 475 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 19012.0 | max reserved: 19012.0
[Rank 64] (after 475 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 19012.0 | max reserved: 19012.0
[Rank 96] (after 475 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 19012.0 | max reserved: 19012.0
[Rank 192] (after 475 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18884.0 | max reserved: 18884.0
[Rank 128] (after 475 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18884.0 | max reserved: 18884.0
[Rank 160] (after 475 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18868.0 | max reserved: 18868.0
[Rank 3] (after 475 iterations) memory (MB) | allocated: 6661.611328125 | max allocated: 11742.55810546875 | reserved: 22890.0 | max reserved: 22890.0
[Rank 35] (after 475 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0
[Rank 227] (after 475 iterations) memory (MB) | allocated: 7107.70751953125 | max allocated: 11884.6845703125 | reserved: 20752.0 | max reserved: 20752.0
[Rank 67] (after 475 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18586.0 | max reserved: 18586.0
[Rank 99] (after 475 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0
[Rank 131] (after 475 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18522.0 | max reserved: 18522.0
[Rank 195] (after 475 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0
[Rank 163] (after 475 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0
 iteration      475/  159576 | consumed samples:         7600 | elapsed time per iteration (ms): 29962.7 | learning rate: 2.108E-06 | global batch size:    16 | lm loss: 7.833103E+00 | loss scale: 4096.0 | grad norm: 47969.708 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      476/  159576 | consumed samples:         7616 | elapsed time per iteration (ms): 13562.3 | learning rate: 2.112E-06 | global batch size:    16 | lm loss: 7.715385E+00 | loss scale: 4096.0 | grad norm: 28643.174 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      477/  159576 | consumed samples:         7632 | elapsed time per iteration (ms): 14532.6 | learning rate: 2.117E-06 | global batch size:    16 | lm loss: 7.912835E+00 | loss scale: 4096.0 | grad norm: 18978.073 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      478/  159576 | consumed samples:         7648 | elapsed time per iteration (ms): 13659.0 | learning rate: 2.121E-06 | global batch size:    16 | lm loss: 7.845491E+00 | loss scale: 4096.0 | grad norm: 29417.161 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      479/  159576 | consumed samples:         7664 | elapsed time per iteration (ms): 13928.5 | learning rate: 2.126E-06 | global batch size:    16 | lm loss: 7.818515E+00 | loss scale: 4096.0 | grad norm: 24185.570 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      480/  159576 | consumed samples:         7680 | elapsed time per iteration (ms): 13863.2 | learning rate: 2.130E-06 | global batch size:    16 | lm loss: 7.759526E+00 | loss scale: 4096.0 | grad norm: 18058.893 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      481/  159576 | consumed samples:         7696 | elapsed time per iteration (ms): 13613.0 | learning rate: 2.135E-06 | global batch size:    16 | lm loss: 7.666837E+00 | loss scale: 4096.0 | grad norm: 21581.295 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      482/  159576 | consumed samples:         7712 | elapsed time per iteration (ms): 13350.8 | learning rate: 2.139E-06 | global batch size:    16 | lm loss: 7.929407E+00 | loss scale: 4096.0 | grad norm: 22311.348 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      483/  159576 | consumed samples:         7728 | elapsed time per iteration (ms): 13819.2 | learning rate: 2.143E-06 | global batch size:    16 | lm loss: 7.786575E+00 | loss scale: 4096.0 | grad norm: 23821.522 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      484/  159576 | consumed samples:         7744 | elapsed time per iteration (ms): 13697.3 | learning rate: 2.148E-06 | global batch size:    16 | lm loss: 7.834505E+00 | loss scale: 4096.0 | grad norm: 18706.902 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      485/  159576 | consumed samples:         7760 | elapsed time per iteration (ms): 13285.4 | learning rate: 2.152E-06 | global batch size:    16 | lm loss: 7.796403E+00 | loss scale: 4096.0 | grad norm: 23055.088 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      486/  159576 | consumed samples:         7776 | elapsed time per iteration (ms): 13893.0 | learning rate: 2.157E-06 | global batch size:    16 | lm loss: 7.853868E+00 | loss scale: 4096.0 | grad norm: 16300.893 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      487/  159576 | consumed samples:         7792 | elapsed time per iteration (ms): 14059.7 | learning rate: 2.161E-06 | global batch size:    16 | lm loss: 7.943846E+00 | loss scale: 4096.0 | grad norm: 18420.386 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      488/  159576 | consumed samples:         7808 | elapsed time per iteration (ms): 13994.0 | learning rate: 2.166E-06 | global batch size:    16 | lm loss: 7.850654E+00 | loss scale: 4096.0 | grad norm: 17235.839 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      489/  159576 | consumed samples:         7824 | elapsed time per iteration (ms): 13596.2 | learning rate: 2.170E-06 | global batch size:    16 | lm loss: 7.825228E+00 | loss scale: 4096.0 | grad norm: 16217.059 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      490/  159576 | consumed samples:         7840 | elapsed time per iteration (ms): 14562.4 | learning rate: 2.175E-06 | global batch size:    16 | lm loss: 7.944909E+00 | loss scale: 4096.0 | grad norm: 20367.528 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      491/  159576 | consumed samples:         7856 | elapsed time per iteration (ms): 13373.8 | learning rate: 2.179E-06 | global batch size:    16 | lm loss: 7.772738E+00 | loss scale: 4096.0 | grad norm: 14868.924 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      492/  159576 | consumed samples:         7872 | elapsed time per iteration (ms): 13407.0 | learning rate: 2.183E-06 | global batch size:    16 | lm loss: 7.807293E+00 | loss scale: 4096.0 | grad norm: 12933.190 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      493/  159576 | consumed samples:         7888 | elapsed time per iteration (ms): 13535.9 | learning rate: 2.188E-06 | global batch size:    16 | lm loss: 7.796512E+00 | loss scale: 4096.0 | grad norm: 14067.056 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      494/  159576 | consumed samples:         7904 | elapsed time per iteration (ms): 13629.5 | learning rate: 2.192E-06 | global batch size:    16 | lm loss: 7.792056E+00 | loss scale: 4096.0 | grad norm: 14953.693 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      495/  159576 | consumed samples:         7920 | elapsed time per iteration (ms): 14163.4 | learning rate: 2.197E-06 | global batch size:    16 | lm loss: 7.703032E+00 | loss scale: 4096.0 | grad norm: 14533.162 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      496/  159576 | consumed samples:         7936 | elapsed time per iteration (ms): 13588.6 | learning rate: 2.201E-06 | global batch size:    16 | lm loss: 7.740438E+00 | loss scale: 4096.0 | grad norm: 13505.957 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      497/  159576 | consumed samples:         7952 | elapsed time per iteration (ms): 13861.0 | learning rate: 2.206E-06 | global batch size:    16 | lm loss: 7.741710E+00 | loss scale: 4096.0 | grad norm: 15979.829 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      498/  159576 | consumed samples:         7968 | elapsed time per iteration (ms): 13984.2 | learning rate: 2.210E-06 | global batch size:    16 | lm loss: 7.999316E+00 | loss scale: 4096.0 | grad norm: 17409.113 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      499/  159576 | consumed samples:         7984 | elapsed time per iteration (ms): 13944.3 | learning rate: 2.214E-06 | global batch size:    16 | lm loss: 7.852047E+00 | loss scale: 4096.0 | grad norm: 17274.017 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      500/  159576 | consumed samples:         8000 | elapsed time per iteration (ms): 13842.0 | learning rate: 2.219E-06 | global batch size:    16 | lm loss: 7.828729E+00 | loss scale: 8192.0 | grad norm: 13323.901 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      501/  159576 | consumed samples:         8016 | elapsed time per iteration (ms): 13887.5 | learning rate: 2.223E-06 | global batch size:    16 | lm loss: 7.889397E+00 | loss scale: 8192.0 | grad norm: 36733.789 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      502/  159576 | consumed samples:         8032 | elapsed time per iteration (ms): 14250.0 | learning rate: 2.228E-06 | global batch size:    16 | lm loss: 7.699535E+00 | loss scale: 8192.0 | grad norm: 25128.484 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      503/  159576 | consumed samples:         8048 | elapsed time per iteration (ms): 14013.2 | learning rate: 2.232E-06 | global batch size:    16 | lm loss: 7.717435E+00 | loss scale: 8192.0 | grad norm: 27928.260 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      504/  159576 | consumed samples:         8064 | elapsed time per iteration (ms): 13885.3 | learning rate: 2.237E-06 | global batch size:    16 | lm loss: 7.793045E+00 | loss scale: 8192.0 | grad norm: 25342.573 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      505/  159576 | consumed samples:         8080 | elapsed time per iteration (ms): 14216.7 | learning rate: 2.241E-06 | global batch size:    16 | lm loss: 7.810180E+00 | loss scale: 8192.0 | grad norm: 32722.154 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      506/  159576 | consumed samples:         8096 | elapsed time per iteration (ms): 13476.3 | learning rate: 2.246E-06 | global batch size:    16 | lm loss: 7.789536E+00 | loss scale: 8192.0 | grad norm: 28438.282 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      507/  159576 | consumed samples:         8112 | elapsed time per iteration (ms): 13866.3 | learning rate: 2.250E-06 | global batch size:    16 | lm loss: 7.752525E+00 | loss scale: 8192.0 | grad norm: 38662.247 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      508/  159576 | consumed samples:         8128 | elapsed time per iteration (ms): 14262.5 | learning rate: 2.254E-06 | global batch size:    16 | lm loss: 7.916237E+00 | loss scale: 8192.0 | grad norm: 36720.277 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      509/  159576 | consumed samples:         8144 | elapsed time per iteration (ms): 13929.6 | learning rate: 2.259E-06 | global batch size:    16 | lm loss: 7.943053E+00 | loss scale: 8192.0 | grad norm: 38847.168 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      510/  159576 | consumed samples:         8160 | elapsed time per iteration (ms): 13830.3 | learning rate: 2.263E-06 | global batch size:    16 | lm loss: 7.853089E+00 | loss scale: 8192.0 | grad norm: 37581.397 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      511/  159576 | consumed samples:         8176 | elapsed time per iteration (ms): 13826.8 | learning rate: 2.268E-06 | global batch size:    16 | lm loss: 7.664119E+00 | loss scale: 8192.0 | grad norm: 34046.642 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      512/  159576 | consumed samples:         8192 | elapsed time per iteration (ms): 14623.1 | learning rate: 2.272E-06 | global batch size:    16 | lm loss: 7.786874E+00 | loss scale: 8192.0 | grad norm: 28303.899 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      513/  159576 | consumed samples:         8208 | elapsed time per iteration (ms): 13633.3 | learning rate: 2.277E-06 | global batch size:    16 | lm loss: 7.763934E+00 | loss scale: 8192.0 | grad norm: 32905.082 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      514/  159576 | consumed samples:         8224 | elapsed time per iteration (ms): 13562.5 | learning rate: 2.281E-06 | global batch size:    16 | lm loss: 7.825607E+00 | loss scale: 8192.0 | grad norm: 32400.005 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      515/  159576 | consumed samples:         8240 | elapsed time per iteration (ms): 13893.1 | learning rate: 2.286E-06 | global batch size:    16 | lm loss: 7.780645E+00 | loss scale: 8192.0 | grad norm: 39597.501 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      516/  159576 | consumed samples:         8256 | elapsed time per iteration (ms): 13943.0 | learning rate: 2.290E-06 | global batch size:    16 | lm loss: 7.949652E+00 | loss scale: 8192.0 | grad norm: 29624.844 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      517/  159576 | consumed samples:         8272 | elapsed time per iteration (ms): 13457.2 | learning rate: 2.294E-06 | global batch size:    16 | lm loss: 7.840482E+00 | loss scale: 8192.0 | grad norm: 34709.122 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-24 04:13:42] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1162855_[1-10%1] on 'gpu_p13' partition)
[2021-09-24 04:13:42] PULSE: tr8-104B is running for 12:37 since 2021-09-24T04:01:05 (1162747 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8])
 iteration      518/  159576 | consumed samples:         8288 | elapsed time per iteration (ms): 13506.3 | learning rate: 2.299E-06 | global batch size:    16 | lm loss: 7.914812E+00 | loss scale: 8192.0 | grad norm: 24295.892 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      519/  159576 | consumed samples:         8304 | elapsed time per iteration (ms): 14169.8 | learning rate: 2.303E-06 | global batch size:    16 | lm loss: 7.710842E+00 | loss scale: 8192.0 | grad norm: 32528.032 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      520/  159576 | consumed samples:         8320 | elapsed time per iteration (ms): 13829.9 | learning rate: 2.308E-06 | global batch size:    16 | lm loss: 7.806552E+00 | loss scale: 8192.0 | grad norm: 37677.096 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      521/  159576 | consumed samples:         8336 | elapsed time per iteration (ms): 13564.6 | learning rate: 2.312E-06 | global batch size:    16 | lm loss: 7.817222E+00 | loss scale: 8192.0 | grad norm: 30827.133 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      522/  159576 | consumed samples:         8352 | elapsed time per iteration (ms): 13848.1 | learning rate: 2.317E-06 | global batch size:    16 | lm loss: 7.805755E+00 | loss scale: 8192.0 | grad norm: 31599.999 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      523/  159576 | consumed samples:         8368 | elapsed time per iteration (ms): 13893.6 | learning rate: 2.321E-06 | global batch size:    16 | lm loss: 7.845006E+00 | loss scale: 8192.0 | grad norm: 34359.630 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      524/  159576 | consumed samples:         8384 | elapsed time per iteration (ms): 13874.2 | learning rate: 2.325E-06 | global batch size:    16 | lm loss: 7.806132E+00 | loss scale: 8192.0 | grad norm: 34509.027 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      525/  159576 | consumed samples:         8400 | elapsed time per iteration (ms): 14357.0 | learning rate: 2.330E-06 | global batch size:    16 | lm loss: 7.713592E+00 | loss scale: 8192.0 | grad norm: 36961.324 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      526/  159576 | consumed samples:         8416 | elapsed time per iteration (ms): 14049.5 | learning rate: 2.334E-06 | global batch size:    16 | lm loss: 7.744096E+00 | loss scale: 8192.0 | grad norm: 46754.633 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      527/  159576 | consumed samples:         8432 | elapsed time per iteration (ms): 14142.6 | learning rate: 2.339E-06 | global batch size:    16 | lm loss: 7.798402E+00 | loss scale: 8192.0 | grad norm: 38396.563 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      528/  159576 | consumed samples:         8448 | elapsed time per iteration (ms): 13474.9 | learning rate: 2.343E-06 | global batch size:    16 | lm loss: 7.987565E+00 | loss scale: 8192.0 | grad norm: 36935.417 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      529/  159576 | consumed samples:         8464 | elapsed time per iteration (ms): 14180.8 | learning rate: 2.348E-06 | global batch size:    16 | lm loss: 7.766053E+00 | loss scale: 8192.0 | grad norm: 35413.152 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      530/  159576 | consumed samples:         8480 | elapsed time per iteration (ms): 13844.6 | learning rate: 2.352E-06 | global batch size:    16 | lm loss: 7.906172E+00 | loss scale: 8192.0 | grad norm: 26808.092 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      531/  159576 | consumed samples:         8496 | elapsed time per iteration (ms): 13786.0 | learning rate: 2.357E-06 | global batch size:    16 | lm loss: 7.840616E+00 | loss scale: 8192.0 | grad norm: 38477.035 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      532/  159576 | consumed samples:         8512 | elapsed time per iteration (ms): 13935.0 | learning rate: 2.361E-06 | global batch size:    16 | lm loss: 7.367872E+00 | loss scale: 8192.0 | grad norm: 51156.079 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      533/  159576 | consumed samples:         8528 | elapsed time per iteration (ms): 14022.6 | learning rate: 2.365E-06 | global batch size:    16 | lm loss: 7.941976E+00 | loss scale: 8192.0 | grad norm: 46439.024 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      534/  159576 | consumed samples:         8544 | elapsed time per iteration (ms): 14296.7 | learning rate: 2.370E-06 | global batch size:    16 | lm loss: 7.869607E+00 | loss scale: 8192.0 | grad norm: 29876.193 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      535/  159576 | consumed samples:         8560 | elapsed time per iteration (ms): 13470.0 | learning rate: 2.374E-06 | global batch size:    16 | lm loss: 7.635067E+00 | loss scale: 8192.0 | grad norm: 34076.920 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      536/  159576 | consumed samples:         8576 | elapsed time per iteration (ms): 13796.1 | learning rate: 2.379E-06 | global batch size:    16 | lm loss: 7.842813E+00 | loss scale: 8192.0 | grad norm: 41800.450 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      537/  159576 | consumed samples:         8592 | elapsed time per iteration (ms): 13818.0 | learning rate: 2.383E-06 | global batch size:    16 | lm loss: 7.984433E+00 | loss scale: 8192.0 | grad norm: 38203.372 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      538/  159576 | consumed samples:         8608 | elapsed time per iteration (ms): 14109.2 | learning rate: 2.388E-06 | global batch size:    16 | lm loss: 7.724606E+00 | loss scale: 8192.0 | grad norm: 44792.862 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      539/  159576 | consumed samples:         8624 | elapsed time per iteration (ms): 13906.3 | learning rate: 2.392E-06 | global batch size:    16 | lm loss: 7.800515E+00 | loss scale: 8192.0 | grad norm: 32297.704 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      540/  159576 | consumed samples:         8640 | elapsed time per iteration (ms): 14143.5 | learning rate: 2.396E-06 | global batch size:    16 | lm loss: 7.871832E+00 | loss scale: 8192.0 | grad norm: 43120.437 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      541/  159576 | consumed samples:         8656 | elapsed time per iteration (ms): 14084.0 | learning rate: 2.401E-06 | global batch size:    16 | lm loss: 7.872537E+00 | loss scale: 8192.0 | grad norm: 36867.265 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      542/  159576 | consumed samples:         8672 | elapsed time per iteration (ms): 13874.8 | learning rate: 2.405E-06 | global batch size:    16 | lm loss: 7.777860E+00 | loss scale: 8192.0 | grad norm: 43001.704 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      543/  159576 | consumed samples:         8688 | elapsed time per iteration (ms): 13779.4 | learning rate: 2.410E-06 | global batch size:    16 | lm loss: 7.682357E+00 | loss scale: 8192.0 | grad norm: 57139.433 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      544/  159576 | consumed samples:         8704 | elapsed time per iteration (ms): 14017.8 | learning rate: 2.414E-06 | global batch size:    16 | lm loss: 7.819186E+00 | loss scale: 8192.0 | grad norm: 29983.983 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      545/  159576 | consumed samples:         8720 | elapsed time per iteration (ms): 13847.0 | learning rate: 2.419E-06 | global batch size:    16 | lm loss: 7.843667E+00 | loss scale: 8192.0 | grad norm: 66015.612 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      546/  159576 | consumed samples:         8736 | elapsed time per iteration (ms): 13982.1 | learning rate: 2.423E-06 | global batch size:    16 | lm loss: 7.894298E+00 | loss scale: 8192.0 | grad norm: 51768.956 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      547/  159576 | consumed samples:         8752 | elapsed time per iteration (ms): 14302.0 | learning rate: 2.428E-06 | global batch size:    16 | lm loss: 7.715273E+00 | loss scale: 8192.0 | grad norm: 39105.868 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      548/  159576 | consumed samples:         8768 | elapsed time per iteration (ms): 14035.0 | learning rate: 2.432E-06 | global batch size:    16 | lm loss: 7.707379E+00 | loss scale: 8192.0 | grad norm: 39549.896 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      549/  159576 | consumed samples:         8784 | elapsed time per iteration (ms): 13590.6 | learning rate: 2.436E-06 | global batch size:    16 | lm loss: 7.786090E+00 | loss scale: 8192.0 | grad norm: 29894.490 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      550/  159576 | consumed samples:         8800 | elapsed time per iteration (ms): 13742.1 | learning rate: 2.441E-06 | global batch size:    16 | lm loss: 7.726188E+00 | loss scale: 8192.0 | grad norm: 34821.397 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      551/  159576 | consumed samples:         8816 | elapsed time per iteration (ms): 13975.5 | learning rate: 2.445E-06 | global batch size:    16 | lm loss: 7.823754E+00 | loss scale: 8192.0 | grad norm: 41726.396 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      552/  159576 | consumed samples:         8832 | elapsed time per iteration (ms): 13862.7 | learning rate: 2.450E-06 | global batch size:    16 | lm loss: 7.780801E+00 | loss scale: 8192.0 | grad norm: 39107.293 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      553/  159576 | consumed samples:         8848 | elapsed time per iteration (ms): 13828.8 | learning rate: 2.454E-06 | global batch size:    16 | lm loss: 7.722218E+00 | loss scale: 8192.0 | grad norm: 34436.410 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      554/  159576 | consumed samples:         8864 | elapsed time per iteration (ms): 14180.4 | learning rate: 2.459E-06 | global batch size:    16 | lm loss: 7.731545E+00 | loss scale: 8192.0 | grad norm: 26819.965 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      555/  159576 | consumed samples:         8880 | elapsed time per iteration (ms): 14282.2 | learning rate: 2.463E-06 | global batch size:    16 | lm loss: 7.705241E+00 | loss scale: 8192.0 | grad norm: 49659.971 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      556/  159576 | consumed samples:         8896 | elapsed time per iteration (ms): 13646.8 | learning rate: 2.467E-06 | global batch size:    16 | lm loss: 8.003874E+00 | loss scale: 8192.0 | grad norm: 37645.277 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      557/  159576 | consumed samples:         8912 | elapsed time per iteration (ms): 13958.8 | learning rate: 2.472E-06 | global batch size:    16 | lm loss: 7.782984E+00 | loss scale: 8192.0 | grad norm: 61655.017 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      558/  159576 | consumed samples:         8928 | elapsed time per iteration (ms): 13955.4 | learning rate: 2.476E-06 | global batch size:    16 | lm loss: 7.811559E+00 | loss scale: 8192.0 | grad norm: 48428.452 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      559/  159576 | consumed samples:         8944 | elapsed time per iteration (ms): 14457.4 | learning rate: 2.481E-06 | global batch size:    16 | lm loss: 7.931767E+00 | loss scale: 8192.0 | grad norm: 38443.785 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      560/  159576 | consumed samples:         8960 | elapsed time per iteration (ms): 13823.4 | learning rate: 2.485E-06 | global batch size:    16 | lm loss: 7.793911E+00 | loss scale: 8192.0 | grad norm: 40207.993 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      561/  159576 | consumed samples:         8976 | elapsed time per iteration (ms): 13982.4 | learning rate: 2.490E-06 | global batch size:    16 | lm loss: 7.842747E+00 | loss scale: 8192.0 | grad norm: 36711.017 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      562/  159576 | consumed samples:         8992 | elapsed time per iteration (ms): 14372.1 | learning rate: 2.494E-06 | global batch size:    16 | lm loss: 7.878882E+00 | loss scale: 8192.0 | grad norm: 54306.049 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      563/  159576 | consumed samples:         9008 | elapsed time per iteration (ms): 13678.7 | learning rate: 2.499E-06 | global batch size:    16 | lm loss: 7.849220E+00 | loss scale: 8192.0 | grad norm: 37543.010 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      564/  159576 | consumed samples:         9024 | elapsed time per iteration (ms): 14069.8 | learning rate: 2.503E-06 | global batch size:    16 | lm loss: 7.844311E+00 | loss scale: 8192.0 | grad norm: 44716.799 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      565/  159576 | consumed samples:         9040 | elapsed time per iteration (ms): 13957.6 | learning rate: 2.507E-06 | global batch size:    16 | lm loss: 7.913968E+00 | loss scale: 8192.0 | grad norm: 47566.400 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      566/  159576 | consumed samples:         9056 | elapsed time per iteration (ms): 14044.6 | learning rate: 2.512E-06 | global batch size:    16 | lm loss: 7.683057E+00 | loss scale: 8192.0 | grad norm: 46568.215 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      567/  159576 | consumed samples:         9072 | elapsed time per iteration (ms): 13881.5 | learning rate: 2.516E-06 | global batch size:    16 | lm loss: 7.870160E+00 | loss scale: 8192.0 | grad norm: 41402.594 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      568/  159576 | consumed samples:         9088 | elapsed time per iteration (ms): 14311.0 | learning rate: 2.521E-06 | global batch size:    16 | lm loss: 7.629350E+00 | loss scale: 8192.0 | grad norm: 39843.869 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      569/  159576 | consumed samples:         9104 | elapsed time per iteration (ms): 14124.8 | learning rate: 2.525E-06 | global batch size:    16 | lm loss: 7.845489E+00 | loss scale: 8192.0 | grad norm: 47458.318 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      570/  159576 | consumed samples:         9120 | elapsed time per iteration (ms): 13702.3 | learning rate: 2.530E-06 | global batch size:    16 | lm loss: 7.848298E+00 | loss scale: 8192.0 | grad norm: 53032.711 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      571/  159576 | consumed samples:         9136 | elapsed time per iteration (ms): 13866.4 | learning rate: 2.534E-06 | global batch size:    16 | lm loss: 7.659620E+00 | loss scale: 8192.0 | grad norm: 37376.686 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      572/  159576 | consumed samples:         9152 | elapsed time per iteration (ms): 14443.8 | learning rate: 2.538E-06 | global batch size:    16 | lm loss: 7.711428E+00 | loss scale: 8192.0 | grad norm: 36846.713 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      573/  159576 | consumed samples:         9168 | elapsed time per iteration (ms): 13723.1 | learning rate: 2.543E-06 | global batch size:    16 | lm loss: 7.800463E+00 | loss scale: 8192.0 | grad norm: 40022.109 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      574/  159576 | consumed samples:         9184 | elapsed time per iteration (ms): 13313.2 | learning rate: 2.547E-06 | global batch size:    16 | lm loss: 7.722570E+00 | loss scale: 8192.0 | grad norm: 57675.937 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      575/  159576 | consumed samples:         9200 | elapsed time per iteration (ms): 13533.3 | learning rate: 2.552E-06 | global batch size:    16 | lm loss: 7.797169E+00 | loss scale: 8192.0 | grad norm: 44067.573 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      576/  159576 | consumed samples:         9216 | elapsed time per iteration (ms): 13750.6 | learning rate: 2.556E-06 | global batch size:    16 | lm loss: 7.624088E+00 | loss scale: 8192.0 | grad norm: 37579.519 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      577/  159576 | consumed samples:         9232 | elapsed time per iteration (ms): 14117.7 | learning rate: 2.561E-06 | global batch size:    16 | lm loss: 7.644238E+00 | loss scale: 8192.0 | grad norm: 57135.338 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      578/  159576 | consumed samples:         9248 | elapsed time per iteration (ms): 13229.4 | learning rate: 2.565E-06 | global batch size:    16 | lm loss: 7.769429E+00 | loss scale: 8192.0 | grad norm: 45266.144 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      579/  159576 | consumed samples:         9264 | elapsed time per iteration (ms): 13610.6 | learning rate: 2.570E-06 | global batch size:    16 | lm loss: 7.508770E+00 | loss scale: 8192.0 | grad norm: 35604.839 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      580/  159576 | consumed samples:         9280 | elapsed time per iteration (ms): 13468.6 | learning rate: 2.574E-06 | global batch size:    16 | lm loss: 7.727168E+00 | loss scale: 8192.0 | grad norm: 37920.954 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      581/  159576 | consumed samples:         9296 | elapsed time per iteration (ms): 14350.0 | learning rate: 2.578E-06 | global batch size:    16 | lm loss: 7.883451E+00 | loss scale: 8192.0 | grad norm: 46515.319 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      582/  159576 | consumed samples:         9312 | elapsed time per iteration (ms): 13963.5 | learning rate: 2.583E-06 | global batch size:    16 | lm loss: 7.781512E+00 | loss scale: 8192.0 | grad norm: 50170.474 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      583/  159576 | consumed samples:         9328 | elapsed time per iteration (ms): 13557.9 | learning rate: 2.587E-06 | global batch size:    16 | lm loss: 7.964473E+00 | loss scale: 8192.0 | grad norm: 29593.283 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      584/  159576 | consumed samples:         9344 | elapsed time per iteration (ms): 13684.8 | learning rate: 2.592E-06 | global batch size:    16 | lm loss: 7.855813E+00 | loss scale: 8192.0 | grad norm: 39619.717 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      585/  159576 | consumed samples:         9360 | elapsed time per iteration (ms): 13900.2 | learning rate: 2.596E-06 | global batch size:    16 | lm loss: 7.877661E+00 | loss scale: 8192.0 | grad norm: 31203.205 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      586/  159576 | consumed samples:         9376 | elapsed time per iteration (ms): 13512.1 | learning rate: 2.601E-06 | global batch size:    16 | lm loss: 7.887114E+00 | loss scale: 8192.0 | grad norm: 63261.561 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      587/  159576 | consumed samples:         9392 | elapsed time per iteration (ms): 13501.8 | learning rate: 2.605E-06 | global batch size:    16 | lm loss: 7.815706E+00 | loss scale: 8192.0 | grad norm: 47655.867 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      588/  159576 | consumed samples:         9408 | elapsed time per iteration (ms): 13350.5 | learning rate: 2.609E-06 | global batch size:    16 | lm loss: 7.754656E+00 | loss scale: 8192.0 | grad norm: 49073.965 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      589/  159576 | consumed samples:         9424 | elapsed time per iteration (ms): 13532.4 | learning rate: 2.614E-06 | global batch size:    16 | lm loss: 7.622519E+00 | loss scale: 8192.0 | grad norm: 39015.125 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      590/  159576 | consumed samples:         9440 | elapsed time per iteration (ms): 13725.1 | learning rate: 2.618E-06 | global batch size:    16 | lm loss: 7.841989E+00 | loss scale: 8192.0 | grad norm: 59373.276 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      591/  159576 | consumed samples:         9456 | elapsed time per iteration (ms): 13818.0 | learning rate: 2.623E-06 | global batch size:    16 | lm loss: 7.730304E+00 | loss scale: 8192.0 | grad norm: 56512.310 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      592/  159576 | consumed samples:         9472 | elapsed time per iteration (ms): 13289.0 | learning rate: 2.627E-06 | global batch size:    16 | lm loss: 7.849043E+00 | loss scale: 8192.0 | grad norm: 44031.624 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      593/  159576 | consumed samples:         9488 | elapsed time per iteration (ms): 13614.5 | learning rate: 2.632E-06 | global batch size:    16 | lm loss: 7.807899E+00 | loss scale: 8192.0 | grad norm: 43332.506 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      594/  159576 | consumed samples:         9504 | elapsed time per iteration (ms): 14163.8 | learning rate: 2.636E-06 | global batch size:    16 | lm loss: 7.765454E+00 | loss scale: 8192.0 | grad norm: 57221.926 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      595/  159576 | consumed samples:         9520 | elapsed time per iteration (ms): 13156.1 | learning rate: 2.641E-06 | global batch size:    16 | lm loss: 7.647946E+00 | loss scale: 8192.0 | grad norm: 61799.391 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      596/  159576 | consumed samples:         9536 | elapsed time per iteration (ms): 13612.4 | learning rate: 2.645E-06 | global batch size:    16 | lm loss: 7.788985E+00 | loss scale: 8192.0 | grad norm: 47569.358 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      597/  159576 | consumed samples:         9552 | elapsed time per iteration (ms): 13614.3 | learning rate: 2.649E-06 | global batch size:    16 | lm loss: 7.796825E+00 | loss scale: 8192.0 | grad norm: 34793.812 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      598/  159576 | consumed samples:         9568 | elapsed time per iteration (ms): 13701.2 | learning rate: 2.654E-06 | global batch size:    16 | lm loss: 7.797745E+00 | loss scale: 8192.0 | grad norm: 78279.259 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      599/  159576 | consumed samples:         9584 | elapsed time per iteration (ms): 13638.2 | learning rate: 2.658E-06 | global batch size:    16 | lm loss: 7.724266E+00 | loss scale: 8192.0 | grad norm: 52804.639 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      600/  159576 | consumed samples:         9600 | elapsed time per iteration (ms): 13579.9 | learning rate: 2.663E-06 | global batch size:    16 | lm loss: 7.820310E+00 | loss scale: 8192.0 | grad norm: 37266.274 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      601/  159576 | consumed samples:         9616 | elapsed time per iteration (ms): 13865.9 | learning rate: 2.667E-06 | global batch size:    16 | lm loss: 7.770097E+00 | loss scale: 8192.0 | grad norm: 35207.333 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      602/  159576 | consumed samples:         9632 | elapsed time per iteration (ms): 13180.7 | learning rate: 2.672E-06 | global batch size:    16 | lm loss: 7.816167E+00 | loss scale: 8192.0 | grad norm: 38744.019 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      603/  159576 | consumed samples:         9648 | elapsed time per iteration (ms): 13931.1 | learning rate: 2.676E-06 | global batch size:    16 | lm loss: 7.817324E+00 | loss scale: 8192.0 | grad norm: 36573.432 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      604/  159576 | consumed samples:         9664 | elapsed time per iteration (ms): 13626.6 | learning rate: 2.680E-06 | global batch size:    16 | lm loss: 7.730925E+00 | loss scale: 8192.0 | grad norm: 34465.028 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      605/  159576 | consumed samples:         9680 | elapsed time per iteration (ms): 13615.1 | learning rate: 2.685E-06 | global batch size:    16 | lm loss: 7.862791E+00 | loss scale: 8192.0 | grad norm: 36177.270 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      606/  159576 | consumed samples:         9696 | elapsed time per iteration (ms): 13496.6 | learning rate: 2.689E-06 | global batch size:    16 | lm loss: 7.773019E+00 | loss scale: 8192.0 | grad norm: 41679.512 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      607/  159576 | consumed samples:         9712 | elapsed time per iteration (ms): 14055.9 | learning rate: 2.694E-06 | global batch size:    16 | lm loss: 7.785677E+00 | loss scale: 8192.0 | grad norm: 37271.202 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      608/  159576 | consumed samples:         9728 | elapsed time per iteration (ms): 13879.6 | learning rate: 2.698E-06 | global batch size:    16 | lm loss: 7.825086E+00 | loss scale: 8192.0 | grad norm: 47809.442 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      609/  159576 | consumed samples:         9744 | elapsed time per iteration (ms): 13552.3 | learning rate: 2.703E-06 | global batch size:    16 | lm loss: 7.740236E+00 | loss scale: 8192.0 | grad norm: 52434.959 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      610/  159576 | consumed samples:         9760 | elapsed time per iteration (ms): 13176.0 | learning rate: 2.707E-06 | global batch size:    16 | lm loss: 7.737531E+00 | loss scale: 8192.0 | grad norm: 48525.539 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      611/  159576 | consumed samples:         9776 | elapsed time per iteration (ms): 13593.3 | learning rate: 2.712E-06 | global batch size:    16 | lm loss: 7.592016E+00 | loss scale: 8192.0 | grad norm: 43005.689 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      612/  159576 | consumed samples:         9792 | elapsed time per iteration (ms): 13859.6 | learning rate: 2.716E-06 | global batch size:    16 | lm loss: 7.717112E+00 | loss scale: 8192.0 | grad norm: 39297.786 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      613/  159576 | consumed samples:         9808 | elapsed time per iteration (ms): 13457.1 | learning rate: 2.720E-06 | global batch size:    16 | lm loss: 7.876259E+00 | loss scale: 8192.0 | grad norm: 46784.787 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      614/  159576 | consumed samples:         9824 | elapsed time per iteration (ms): 13891.1 | learning rate: 2.725E-06 | global batch size:    16 | lm loss: 7.783233E+00 | loss scale: 8192.0 | grad norm: 55950.281 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      615/  159576 | consumed samples:         9840 | elapsed time per iteration (ms): 13986.9 | learning rate: 2.729E-06 | global batch size:    16 | lm loss: 7.671467E+00 | loss scale: 8192.0 | grad norm: 37634.889 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      616/  159576 | consumed samples:         9856 | elapsed time per iteration (ms): 14382.5 | learning rate: 2.734E-06 | global batch size:    16 | lm loss: 7.716076E+00 | loss scale: 8192.0 | grad norm: 39465.766 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      617/  159576 | consumed samples:         9872 | elapsed time per iteration (ms): 13446.9 | learning rate: 2.738E-06 | global batch size:    16 | lm loss: 7.701165E+00 | loss scale: 8192.0 | grad norm: 33600.381 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      618/  159576 | consumed samples:         9888 | elapsed time per iteration (ms): 13921.0 | learning rate: 2.743E-06 | global batch size:    16 | lm loss: 7.846385E+00 | loss scale: 8192.0 | grad norm: 34178.825 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      619/  159576 | consumed samples:         9904 | elapsed time per iteration (ms): 13866.6 | learning rate: 2.747E-06 | global batch size:    16 | lm loss: 7.788978E+00 | loss scale: 8192.0 | grad norm: 39840.427 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      620/  159576 | consumed samples:         9920 | elapsed time per iteration (ms): 14194.3 | learning rate: 2.751E-06 | global batch size:    16 | lm loss: 7.718859E+00 | loss scale: 8192.0 | grad norm: 35668.255 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      621/  159576 | consumed samples:         9936 | elapsed time per iteration (ms): 14052.1 | learning rate: 2.756E-06 | global batch size:    16 | lm loss: 7.815299E+00 | loss scale: 8192.0 | grad norm: 65082.529 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      622/  159576 | consumed samples:         9952 | elapsed time per iteration (ms): 13986.4 | learning rate: 2.760E-06 | global batch size:    16 | lm loss: 7.647432E+00 | loss scale: 8192.0 | grad norm: 30577.960 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      623/  159576 | consumed samples:         9968 | elapsed time per iteration (ms): 14070.1 | learning rate: 2.765E-06 | global batch size:    16 | lm loss: 7.470105E+00 | loss scale: 8192.0 | grad norm: 49150.823 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      624/  159576 | consumed samples:         9984 | elapsed time per iteration (ms): 13591.8 | learning rate: 2.769E-06 | global batch size:    16 | lm loss: 7.751683E+00 | loss scale: 8192.0 | grad norm: 37773.421 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      625/  159576 | consumed samples:        10000 | elapsed time per iteration (ms): 14109.1 | learning rate: 2.774E-06 | global batch size:    16 | lm loss: 7.850559E+00 | loss scale: 8192.0 | grad norm: 49716.008 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      626/  159576 | consumed samples:        10016 | elapsed time per iteration (ms): 13883.7 | learning rate: 2.778E-06 | global batch size:    16 | lm loss: 7.761450E+00 | loss scale: 8192.0 | grad norm: 40472.569 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      627/  159576 | consumed samples:        10032 | elapsed time per iteration (ms): 13871.1 | learning rate: 2.783E-06 | global batch size:    16 | lm loss: 7.638558E+00 | loss scale: 8192.0 | grad norm: 32194.907 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      628/  159576 | consumed samples:        10048 | elapsed time per iteration (ms): 14009.2 | learning rate: 2.787E-06 | global batch size:    16 | lm loss: 7.602344E+00 | loss scale: 8192.0 | grad norm: 48067.346 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      629/  159576 | consumed samples:        10064 | elapsed time per iteration (ms): 14668.1 | learning rate: 2.791E-06 | global batch size:    16 | lm loss: 7.641259E+00 | loss scale: 8192.0 | grad norm: 36222.940 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      630/  159576 | consumed samples:        10080 | elapsed time per iteration (ms): 13862.3 | learning rate: 2.796E-06 | global batch size:    16 | lm loss: 7.665779E+00 | loss scale: 8192.0 | grad norm: 42515.535 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      631/  159576 | consumed samples:        10096 | elapsed time per iteration (ms): 13588.5 | learning rate: 2.800E-06 | global batch size:    16 | lm loss: 7.754525E+00 | loss scale: 8192.0 | grad norm: 49054.878 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      632/  159576 | consumed samples:        10112 | elapsed time per iteration (ms): 13844.9 | learning rate: 2.805E-06 | global batch size:    16 | lm loss: 7.774928E+00 | loss scale: 8192.0 | grad norm: 45662.541 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      633/  159576 | consumed samples:        10128 | elapsed time per iteration (ms): 14341.8 | learning rate: 2.809E-06 | global batch size:    16 | lm loss: 7.554594E+00 | loss scale: 8192.0 | grad norm: 60744.743 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      634/  159576 | consumed samples:        10144 | elapsed time per iteration (ms): 13746.1 | learning rate: 2.814E-06 | global batch size:    16 | lm loss: 7.637143E+00 | loss scale: 8192.0 | grad norm: 49330.376 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      635/  159576 | consumed samples:        10160 | elapsed time per iteration (ms): 13894.5 | learning rate: 2.818E-06 | global batch size:    16 | lm loss: 7.983640E+00 | loss scale: 8192.0 | grad norm: 49417.095 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      636/  159576 | consumed samples:        10176 | elapsed time per iteration (ms): 14194.7 | learning rate: 2.822E-06 | global batch size:    16 | lm loss: 7.681066E+00 | loss scale: 8192.0 | grad norm: 61468.093 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      637/  159576 | consumed samples:        10192 | elapsed time per iteration (ms): 13961.2 | learning rate: 2.827E-06 | global batch size:    16 | lm loss: 7.862648E+00 | loss scale: 8192.0 | grad norm: 72192.162 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      638/  159576 | consumed samples:        10208 | elapsed time per iteration (ms): 13647.5 | learning rate: 2.831E-06 | global batch size:    16 | lm loss: 7.569575E+00 | loss scale: 8192.0 | grad norm: 45669.961 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      639/  159576 | consumed samples:        10224 | elapsed time per iteration (ms): 13856.5 | learning rate: 2.836E-06 | global batch size:    16 | lm loss: 7.844266E+00 | loss scale: 8192.0 | grad norm: 36677.085 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      640/  159576 | consumed samples:        10240 | elapsed time per iteration (ms): 14073.9 | learning rate: 2.840E-06 | global batch size:    16 | lm loss: 7.845327E+00 | loss scale: 8192.0 | grad norm: 96907.467 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      641/  159576 | consumed samples:        10256 | elapsed time per iteration (ms): 13796.2 | learning rate: 2.845E-06 | global batch size:    16 | lm loss: 7.647357E+00 | loss scale: 8192.0 | grad norm: 57700.704 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      642/  159576 | consumed samples:        10272 | elapsed time per iteration (ms): 14118.9 | learning rate: 2.849E-06 | global batch size:    16 | lm loss: 7.207680E+00 | loss scale: 8192.0 | grad norm: 51064.672 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      643/  159576 | consumed samples:        10288 | elapsed time per iteration (ms): 14102.7 | learning rate: 2.854E-06 | global batch size:    16 | lm loss: 7.651158E+00 | loss scale: 8192.0 | grad norm: 42382.351 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      644/  159576 | consumed samples:        10304 | elapsed time per iteration (ms): 14051.2 | learning rate: 2.858E-06 | global batch size:    16 | lm loss: 7.854011E+00 | loss scale: 8192.0 | grad norm: 91247.279 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      645/  159576 | consumed samples:        10320 | elapsed time per iteration (ms): 13538.9 | learning rate: 2.862E-06 | global batch size:    16 | lm loss: 7.769484E+00 | loss scale: 8192.0 | grad norm: 69652.208 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      646/  159576 | consumed samples:        10336 | elapsed time per iteration (ms): 14249.0 | learning rate: 2.867E-06 | global batch size:    16 | lm loss: 7.553013E+00 | loss scale: 8192.0 | grad norm: 51636.193 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      647/  159576 | consumed samples:        10352 | elapsed time per iteration (ms): 13970.2 | learning rate: 2.871E-06 | global batch size:    16 | lm loss: 8.084120E+00 | loss scale: 8192.0 | grad norm: 43277.569 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      648/  159576 | consumed samples:        10368 | elapsed time per iteration (ms): 13853.5 | learning rate: 2.876E-06 | global batch size:    16 | lm loss: 7.727980E+00 | loss scale: 8192.0 | grad norm: 61582.321 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      649/  159576 | consumed samples:        10384 | elapsed time per iteration (ms): 13732.7 | learning rate: 2.880E-06 | global batch size:    16 | lm loss: 8.087885E+00 | loss scale: 8192.0 | grad norm: 80675.460 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      650/  159576 | consumed samples:        10400 | elapsed time per iteration (ms): 14065.0 | learning rate: 2.885E-06 | global batch size:    16 | lm loss: 7.735159E+00 | loss scale: 8192.0 | grad norm: 57826.799 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      651/  159576 | consumed samples:        10416 | elapsed time per iteration (ms): 14427.2 | learning rate: 2.889E-06 | global batch size:    16 | lm loss: 7.631308E+00 | loss scale: 8192.0 | grad norm: 36267.499 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      652/  159576 | consumed samples:        10432 | elapsed time per iteration (ms): 13615.7 | learning rate: 2.893E-06 | global batch size:    16 | lm loss: 7.756464E+00 | loss scale: 8192.0 | grad norm: 90673.943 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      653/  159576 | consumed samples:        10448 | elapsed time per iteration (ms): 13935.6 | learning rate: 2.898E-06 | global batch size:    16 | lm loss: 7.687772E+00 | loss scale: 8192.0 | grad norm: 73567.241 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      654/  159576 | consumed samples:        10464 | elapsed time per iteration (ms): 14106.4 | learning rate: 2.902E-06 | global batch size:    16 | lm loss: 7.805472E+00 | loss scale: 8192.0 | grad norm: 43212.657 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      655/  159576 | consumed samples:        10480 | elapsed time per iteration (ms): 13870.0 | learning rate: 2.907E-06 | global batch size:    16 | lm loss: 7.733329E+00 | loss scale: 8192.0 | grad norm: 42721.480 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      656/  159576 | consumed samples:        10496 | elapsed time per iteration (ms): 13912.1 | learning rate: 2.911E-06 | global batch size:    16 | lm loss: 7.764544E+00 | loss scale: 8192.0 | grad norm: 95237.236 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      657/  159576 | consumed samples:        10512 | elapsed time per iteration (ms): 13959.6 | learning rate: 2.916E-06 | global batch size:    16 | lm loss: 7.873410E+00 | loss scale: 8192.0 | grad norm: 58039.908 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      658/  159576 | consumed samples:        10528 | elapsed time per iteration (ms): 14236.4 | learning rate: 2.920E-06 | global batch size:    16 | lm loss: 7.776018E+00 | loss scale: 8192.0 | grad norm: 47844.539 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      659/  159576 | consumed samples:        10544 | elapsed time per iteration (ms): 14055.2 | learning rate: 2.925E-06 | global batch size:    16 | lm loss: 7.913632E+00 | loss scale: 8192.0 | grad norm: 52680.297 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      660/  159576 | consumed samples:        10560 | elapsed time per iteration (ms): 13952.7 | learning rate: 2.929E-06 | global batch size:    16 | lm loss: 7.682195E+00 | loss scale: 8192.0 | grad norm: 43818.277 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      661/  159576 | consumed samples:        10576 | elapsed time per iteration (ms): 14150.0 | learning rate: 2.933E-06 | global batch size:    16 | lm loss: 7.787490E+00 | loss scale: 8192.0 | grad norm: 79352.333 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      662/  159576 | consumed samples:        10592 | elapsed time per iteration (ms): 13865.0 | learning rate: 2.938E-06 | global batch size:    16 | lm loss: 7.774850E+00 | loss scale: 8192.0 | grad norm: 38730.216 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      663/  159576 | consumed samples:        10608 | elapsed time per iteration (ms): 14161.1 | learning rate: 2.942E-06 | global batch size:    16 | lm loss: 7.580084E+00 | loss scale: 8192.0 | grad norm: 41013.803 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      664/  159576 | consumed samples:        10624 | elapsed time per iteration (ms): 13917.2 | learning rate: 2.947E-06 | global batch size:    16 | lm loss: 7.885849E+00 | loss scale: 8192.0 | grad norm: 52940.997 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      665/  159576 | consumed samples:        10640 | elapsed time per iteration (ms): 14187.3 | learning rate: 2.951E-06 | global batch size:    16 | lm loss: 7.708643E+00 | loss scale: 8192.0 | grad norm: 45471.400 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      666/  159576 | consumed samples:        10656 | elapsed time per iteration (ms): 13816.1 | learning rate: 2.956E-06 | global batch size:    16 | lm loss: 7.852731E+00 | loss scale: 8192.0 | grad norm: 34948.074 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      667/  159576 | consumed samples:        10672 | elapsed time per iteration (ms): 13998.2 | learning rate: 2.960E-06 | global batch size:    16 | lm loss: 7.783283E+00 | loss scale: 8192.0 | grad norm: 72415.130 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      668/  159576 | consumed samples:        10688 | elapsed time per iteration (ms): 14355.3 | learning rate: 2.964E-06 | global batch size:    16 | lm loss: 7.606567E+00 | loss scale: 8192.0 | grad norm: 40358.601 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      669/  159576 | consumed samples:        10704 | elapsed time per iteration (ms): 13737.0 | learning rate: 2.969E-06 | global batch size:    16 | lm loss: 7.726189E+00 | loss scale: 8192.0 | grad norm: 40258.377 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      670/  159576 | consumed samples:        10720 | elapsed time per iteration (ms): 13793.7 | learning rate: 2.973E-06 | global batch size:    16 | lm loss: 7.691747E+00 | loss scale: 8192.0 | grad norm: 41826.699 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      671/  159576 | consumed samples:        10736 | elapsed time per iteration (ms): 13990.9 | learning rate: 2.978E-06 | global batch size:    16 | lm loss: 7.731771E+00 | loss scale: 8192.0 | grad norm: 73683.310 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      672/  159576 | consumed samples:        10752 | elapsed time per iteration (ms): 14342.7 | learning rate: 2.982E-06 | global batch size:    16 | lm loss: 7.751697E+00 | loss scale: 8192.0 | grad norm: 45162.989 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      673/  159576 | consumed samples:        10768 | elapsed time per iteration (ms): 14019.6 | learning rate: 2.987E-06 | global batch size:    16 | lm loss: 7.628830E+00 | loss scale: 8192.0 | grad norm: 50354.520 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      674/  159576 | consumed samples:        10784 | elapsed time per iteration (ms): 13505.9 | learning rate: 2.991E-06 | global batch size:    16 | lm loss: 7.737679E+00 | loss scale: 8192.0 | grad norm: 42630.535 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      675/  159576 | consumed samples:        10800 | elapsed time per iteration (ms): 14062.7 | learning rate: 2.996E-06 | global batch size:    16 | lm loss: 7.697219E+00 | loss scale: 8192.0 | grad norm: 74141.374 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      676/  159576 | consumed samples:        10816 | elapsed time per iteration (ms): 14348.9 | learning rate: 3.000E-06 | global batch size:    16 | lm loss: 7.685856E+00 | loss scale: 8192.0 | grad norm: 42229.307 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      677/  159576 | consumed samples:        10832 | elapsed time per iteration (ms): 13490.6 | learning rate: 3.004E-06 | global batch size:    16 | lm loss: 7.675433E+00 | loss scale: 8192.0 | grad norm: 41266.542 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      678/  159576 | consumed samples:        10848 | elapsed time per iteration (ms): 13864.0 | learning rate: 3.009E-06 | global batch size:    16 | lm loss: 7.602362E+00 | loss scale: 8192.0 | grad norm: 28128.791 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      679/  159576 | consumed samples:        10864 | elapsed time per iteration (ms): 13876.8 | learning rate: 3.013E-06 | global batch size:    16 | lm loss: 7.921748E+00 | loss scale: 8192.0 | grad norm: 94093.080 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      680/  159576 | consumed samples:        10880 | elapsed time per iteration (ms): 14089.6 | learning rate: 3.018E-06 | global batch size:    16 | lm loss: 7.932827E+00 | loss scale: 8192.0 | grad norm: 66492.252 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      681/  159576 | consumed samples:        10896 | elapsed time per iteration (ms): 13869.3 | learning rate: 3.022E-06 | global batch size:    16 | lm loss: 7.712299E+00 | loss scale: 8192.0 | grad norm: 48293.630 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      682/  159576 | consumed samples:        10912 | elapsed time per iteration (ms): 14135.1 | learning rate: 3.027E-06 | global batch size:    16 | lm loss: 7.638190E+00 | loss scale: 8192.0 | grad norm: 38847.818 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      683/  159576 | consumed samples:        10928 | elapsed time per iteration (ms): 13923.5 | learning rate: 3.031E-06 | global batch size:    16 | lm loss: 7.728378E+00 | loss scale: 8192.0 | grad norm: 145094.985 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      684/  159576 | consumed samples:        10944 | elapsed time per iteration (ms): 13370.2 | learning rate: 3.036E-06 | global batch size:    16 | lm loss: 7.695971E+00 | loss scale: 8192.0 | grad norm: 72337.161 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      685/  159576 | consumed samples:        10960 | elapsed time per iteration (ms): 14077.4 | learning rate: 3.040E-06 | global batch size:    16 | lm loss: 7.967864E+00 | loss scale: 8192.0 | grad norm: 60013.396 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      686/  159576 | consumed samples:        10976 | elapsed time per iteration (ms): 13866.9 | learning rate: 3.044E-06 | global batch size:    16 | lm loss: 7.790969E+00 | loss scale: 8192.0 | grad norm: 66989.408 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      687/  159576 | consumed samples:        10992 | elapsed time per iteration (ms): 13994.5 | learning rate: 3.049E-06 | global batch size:    16 | lm loss: 7.558614E+00 | loss scale: 8192.0 | grad norm: 41316.798 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      688/  159576 | consumed samples:        11008 | elapsed time per iteration (ms): 13732.9 | learning rate: 3.053E-06 | global batch size:    16 | lm loss: 7.831646E+00 | loss scale: 8192.0 | grad norm: 113582.407 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      689/  159576 | consumed samples:        11024 | elapsed time per iteration (ms): 14223.7 | learning rate: 3.058E-06 | global batch size:    16 | lm loss: 7.934176E+00 | loss scale: 8192.0 | grad norm: 88203.837 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      690/  159576 | consumed samples:        11040 | elapsed time per iteration (ms): 14149.5 | learning rate: 3.062E-06 | global batch size:    16 | lm loss: 8.017797E+00 | loss scale: 8192.0 | grad norm: 58624.816 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      691/  159576 | consumed samples:        11056 | elapsed time per iteration (ms): 13400.2 | learning rate: 3.067E-06 | global batch size:    16 | lm loss: 7.660833E+00 | loss scale: 8192.0 | grad norm: 55959.298 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      692/  159576 | consumed samples:        11072 | elapsed time per iteration (ms): 13833.8 | learning rate: 3.071E-06 | global batch size:    16 | lm loss: 7.664068E+00 | loss scale: 8192.0 | grad norm: 59276.124 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      693/  159576 | consumed samples:        11088 | elapsed time per iteration (ms): 14240.4 | learning rate: 3.075E-06 | global batch size:    16 | lm loss: 7.707018E+00 | loss scale: 8192.0 | grad norm: 93883.971 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      694/  159576 | consumed samples:        11104 | elapsed time per iteration (ms): 13875.3 | learning rate: 3.080E-06 | global batch size:    16 | lm loss: 7.786274E+00 | loss scale: 8192.0 | grad norm: 64903.918 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      695/  159576 | consumed samples:        11120 | elapsed time per iteration (ms): 13813.0 | learning rate: 3.084E-06 | global batch size:    16 | lm loss: 7.512930E+00 | loss scale: 8192.0 | grad norm: 51983.944 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      696/  159576 | consumed samples:        11136 | elapsed time per iteration (ms): 13976.3 | learning rate: 3.089E-06 | global batch size:    16 | lm loss: 7.692935E+00 | loss scale: 8192.0 | grad norm: 60144.327 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      697/  159576 | consumed samples:        11152 | elapsed time per iteration (ms): 14241.9 | learning rate: 3.093E-06 | global batch size:    16 | lm loss: 7.665162E+00 | loss scale: 8192.0 | grad norm: 45825.959 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      698/  159576 | consumed samples:        11168 | elapsed time per iteration (ms): 13633.7 | learning rate: 3.098E-06 | global batch size:    16 | lm loss: 7.619460E+00 | loss scale: 8192.0 | grad norm: 50817.283 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      699/  159576 | consumed samples:        11184 | elapsed time per iteration (ms): 13862.8 | learning rate: 3.102E-06 | global batch size:    16 | lm loss: 7.827911E+00 | loss scale: 8192.0 | grad norm: 55475.644 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      700/  159576 | consumed samples:        11200 | elapsed time per iteration (ms): 13992.4 | learning rate: 3.107E-06 | global batch size:    16 | lm loss: 7.651889E+00 | loss scale: 8192.0 | grad norm: 41255.123 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      701/  159576 | consumed samples:        11216 | elapsed time per iteration (ms): 13980.6 | learning rate: 3.111E-06 | global batch size:    16 | lm loss: 7.715150E+00 | loss scale: 8192.0 | grad norm: 54466.199 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      702/  159576 | consumed samples:        11232 | elapsed time per iteration (ms): 13968.4 | learning rate: 3.115E-06 | global batch size:    16 | lm loss: 7.782993E+00 | loss scale: 8192.0 | grad norm: 52144.399 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      703/  159576 | consumed samples:        11248 | elapsed time per iteration (ms): 13960.9 | learning rate: 3.120E-06 | global batch size:    16 | lm loss: 7.681329E+00 | loss scale: 8192.0 | grad norm: 51153.990 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      704/  159576 | consumed samples:        11264 | elapsed time per iteration (ms): 14082.5 | learning rate: 3.124E-06 | global batch size:    16 | lm loss: 7.697348E+00 | loss scale: 8192.0 | grad norm: 30117.468 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      705/  159576 | consumed samples:        11280 | elapsed time per iteration (ms): 13980.4 | learning rate: 3.129E-06 | global batch size:    16 | lm loss: 7.733425E+00 | loss scale: 8192.0 | grad norm: 49027.047 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      706/  159576 | consumed samples:        11296 | elapsed time per iteration (ms): 13865.4 | learning rate: 3.133E-06 | global batch size:    16 | lm loss: 7.844088E+00 | loss scale: 8192.0 | grad norm: 43555.293 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      707/  159576 | consumed samples:        11312 | elapsed time per iteration (ms): 13817.5 | learning rate: 3.138E-06 | global batch size:    16 | lm loss: 7.752273E+00 | loss scale: 8192.0 | grad norm: 96517.184 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      708/  159576 | consumed samples:        11328 | elapsed time per iteration (ms): 13958.9 | learning rate: 3.142E-06 | global batch size:    16 | lm loss: 7.757376E+00 | loss scale: 8192.0 | grad norm: 77216.323 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      709/  159576 | consumed samples:        11344 | elapsed time per iteration (ms): 13428.3 | learning rate: 3.146E-06 | global batch size:    16 | lm loss: 7.687693E+00 | loss scale: 8192.0 | grad norm: 57064.888 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      710/  159576 | consumed samples:        11360 | elapsed time per iteration (ms): 13648.2 | learning rate: 3.151E-06 | global batch size:    16 | lm loss: 7.663705E+00 | loss scale: 8192.0 | grad norm: 50512.811 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      711/  159576 | consumed samples:        11376 | elapsed time per iteration (ms): 14017.0 | learning rate: 3.155E-06 | global batch size:    16 | lm loss: 7.597622E+00 | loss scale: 8192.0 | grad norm: 52114.282 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      712/  159576 | consumed samples:        11392 | elapsed time per iteration (ms): 13780.7 | learning rate: 3.160E-06 | global batch size:    16 | lm loss: 7.771480E+00 | loss scale: 8192.0 | grad norm: 169756.868 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      713/  159576 | consumed samples:        11408 | elapsed time per iteration (ms): 13096.8 | learning rate: 3.164E-06 | global batch size:    16 | lm loss: 7.713109E+00 | loss scale: 8192.0 | grad norm: 87094.017 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      714/  159576 | consumed samples:        11424 | elapsed time per iteration (ms): 13743.9 | learning rate: 3.169E-06 | global batch size:    16 | lm loss: 7.749861E+00 | loss scale: 8192.0 | grad norm: 49749.127 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      715/  159576 | consumed samples:        11440 | elapsed time per iteration (ms): 14274.0 | learning rate: 3.173E-06 | global batch size:    16 | lm loss: 7.797529E+00 | loss scale: 8192.0 | grad norm: 51932.227 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      716/  159576 | consumed samples:        11456 | elapsed time per iteration (ms): 13788.8 | learning rate: 3.178E-06 | global batch size:    16 | lm loss: 7.704132E+00 | loss scale: 8192.0 | grad norm: 68478.047 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      717/  159576 | consumed samples:        11472 | elapsed time per iteration (ms): 13977.5 | learning rate: 3.182E-06 | global batch size:    16 | lm loss: 7.746219E+00 | loss scale: 8192.0 | grad norm: 107770.469 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      718/  159576 | consumed samples:        11488 | elapsed time per iteration (ms): 13786.8 | learning rate: 3.186E-06 | global batch size:    16 | lm loss: 7.617724E+00 | loss scale: 8192.0 | grad norm: 57419.512 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      719/  159576 | consumed samples:        11504 | elapsed time per iteration (ms): 14003.5 | learning rate: 3.191E-06 | global batch size:    16 | lm loss: 7.642632E+00 | loss scale: 8192.0 | grad norm: 48000.387 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      720/  159576 | consumed samples:        11520 | elapsed time per iteration (ms): 13651.1 | learning rate: 3.195E-06 | global batch size:    16 | lm loss: 7.790938E+00 | loss scale: 8192.0 | grad norm: 45384.886 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      721/  159576 | consumed samples:        11536 | elapsed time per iteration (ms): 13820.3 | learning rate: 3.200E-06 | global batch size:    16 | lm loss: 7.799318E+00 | loss scale: 8192.0 | grad norm: 94827.685 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      722/  159576 | consumed samples:        11552 | elapsed time per iteration (ms): 13998.9 | learning rate: 3.204E-06 | global batch size:    16 | lm loss: 7.924202E+00 | loss scale: 8192.0 | grad norm: 106713.536 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      723/  159576 | consumed samples:        11568 | elapsed time per iteration (ms): 13787.6 | learning rate: 3.209E-06 | global batch size:    16 | lm loss: 7.662113E+00 | loss scale: 8192.0 | grad norm: 53132.240 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      724/  159576 | consumed samples:        11584 | elapsed time per iteration (ms): 14003.4 | learning rate: 3.213E-06 | global batch size:    16 | lm loss: 7.735355E+00 | loss scale: 8192.0 | grad norm: 46503.362 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      725/  159576 | consumed samples:        11600 | elapsed time per iteration (ms): 14211.4 | learning rate: 3.217E-06 | global batch size:    16 | lm loss: 7.413515E+00 | loss scale: 8192.0 | grad norm: 46300.080 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      726/  159576 | consumed samples:        11616 | elapsed time per iteration (ms): 14085.1 | learning rate: 3.222E-06 | global batch size:    16 | lm loss: 7.793005E+00 | loss scale: 8192.0 | grad norm: 123901.591 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      727/  159576 | consumed samples:        11632 | elapsed time per iteration (ms): 13498.1 | learning rate: 3.226E-06 | global batch size:    16 | lm loss: 7.570110E+00 | loss scale: 8192.0 | grad norm: 110746.235 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      728/  159576 | consumed samples:        11648 | elapsed time per iteration (ms): 13944.5 | learning rate: 3.231E-06 | global batch size:    16 | lm loss: 7.805285E+00 | loss scale: 8192.0 | grad norm: 54666.569 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      729/  159576 | consumed samples:        11664 | elapsed time per iteration (ms): 13478.9 | learning rate: 3.235E-06 | global batch size:    16 | lm loss: 7.702326E+00 | loss scale: 8192.0 | grad norm: 95219.862 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      730/  159576 | consumed samples:        11680 | elapsed time per iteration (ms): 13419.9 | learning rate: 3.240E-06 | global batch size:    16 | lm loss: 7.694516E+00 | loss scale: 8192.0 | grad norm: 44428.528 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      731/  159576 | consumed samples:        11696 | elapsed time per iteration (ms): 13890.7 | learning rate: 3.244E-06 | global batch size:    16 | lm loss: 7.656667E+00 | loss scale: 8192.0 | grad norm: 79142.888 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      732/  159576 | consumed samples:        11712 | elapsed time per iteration (ms): 14381.2 | learning rate: 3.249E-06 | global batch size:    16 | lm loss: 7.689932E+00 | loss scale: 8192.0 | grad norm: 69883.450 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-24 05:03:31] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1162855_[1-10%1] on 'gpu_p13' partition)
[2021-09-24 05:03:31] PULSE: tr8-104B is running for 1:02:26 since 2021-09-24T04:01:05 (1162747 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8])
 iteration      733/  159576 | consumed samples:        11728 | elapsed time per iteration (ms): 13725.2 | learning rate: 3.253E-06 | global batch size:    16 | lm loss: 7.808900E+00 | loss scale: 8192.0 | grad norm: 50692.428 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      734/  159576 | consumed samples:        11744 | elapsed time per iteration (ms): 13115.2 | learning rate: 3.257E-06 | global batch size:    16 | lm loss: 7.737029E+00 | loss scale: 8192.0 | grad norm: 69149.275 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      735/  159576 | consumed samples:        11760 | elapsed time per iteration (ms): 13493.9 | learning rate: 3.262E-06 | global batch size:    16 | lm loss: 7.630354E+00 | loss scale: 8192.0 | grad norm: 85240.602 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      736/  159576 | consumed samples:        11776 | elapsed time per iteration (ms): 13636.0 | learning rate: 3.266E-06 | global batch size:    16 | lm loss: 7.626644E+00 | loss scale: 8192.0 | grad norm: 57646.552 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      737/  159576 | consumed samples:        11792 | elapsed time per iteration (ms): 13810.1 | learning rate: 3.271E-06 | global batch size:    16 | lm loss: 7.526936E+00 | loss scale: 8192.0 | grad norm: 95065.076 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      738/  159576 | consumed samples:        11808 | elapsed time per iteration (ms): 13385.6 | learning rate: 3.275E-06 | global batch size:    16 | lm loss: 7.820796E+00 | loss scale: 8192.0 | grad norm: 113407.272 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      739/  159576 | consumed samples:        11824 | elapsed time per iteration (ms): 13689.8 | learning rate: 3.280E-06 | global batch size:    16 | lm loss: 7.774467E+00 | loss scale: 8192.0 | grad norm: 98657.078 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      740/  159576 | consumed samples:        11840 | elapsed time per iteration (ms): 13965.2 | learning rate: 3.284E-06 | global batch size:    16 | lm loss: 7.762564E+00 | loss scale: 8192.0 | grad norm: 71745.217 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      741/  159576 | consumed samples:        11856 | elapsed time per iteration (ms): 13569.2 | learning rate: 3.288E-06 | global batch size:    16 | lm loss: 7.608281E+00 | loss scale: 8192.0 | grad norm: 40905.544 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      742/  159576 | consumed samples:        11872 | elapsed time per iteration (ms): 13635.8 | learning rate: 3.293E-06 | global batch size:    16 | lm loss: 7.570668E+00 | loss scale: 8192.0 | grad norm: 80257.423 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      743/  159576 | consumed samples:        11888 | elapsed time per iteration (ms): 13669.8 | learning rate: 3.297E-06 | global batch size:    16 | lm loss: 7.586653E+00 | loss scale: 8192.0 | grad norm: 56412.186 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      744/  159576 | consumed samples:        11904 | elapsed time per iteration (ms): 13473.9 | learning rate: 3.302E-06 | global batch size:    16 | lm loss: 7.701398E+00 | loss scale: 8192.0 | grad norm: 100221.753 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      745/  159576 | consumed samples:        11920 | elapsed time per iteration (ms): 13453.8 | learning rate: 3.306E-06 | global batch size:    16 | lm loss: 7.772648E+00 | loss scale: 8192.0 | grad norm: 88519.971 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      746/  159576 | consumed samples:        11936 | elapsed time per iteration (ms): 13732.5 | learning rate: 3.311E-06 | global batch size:    16 | lm loss: 7.940891E+00 | loss scale: 8192.0 | grad norm: 66980.299 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      747/  159576 | consumed samples:        11952 | elapsed time per iteration (ms): 13956.5 | learning rate: 3.315E-06 | global batch size:    16 | lm loss: 7.879022E+00 | loss scale: 8192.0 | grad norm: 73008.302 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      748/  159576 | consumed samples:        11968 | elapsed time per iteration (ms): 13250.5 | learning rate: 3.320E-06 | global batch size:    16 | lm loss: 7.693480E+00 | loss scale: 8192.0 | grad norm: 45346.275 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      749/  159576 | consumed samples:        11984 | elapsed time per iteration (ms): 13529.3 | learning rate: 3.324E-06 | global batch size:    16 | lm loss: 7.658270E+00 | loss scale: 8192.0 | grad norm: 156261.718 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      750/  159576 | consumed samples:        12000 | elapsed time per iteration (ms): 14110.0 | learning rate: 3.328E-06 | global batch size:    16 | lm loss: 7.741945E+00 | loss scale: 8192.0 | grad norm: 121818.343 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      751/  159576 | consumed samples:        12016 | elapsed time per iteration (ms): 13463.3 | learning rate: 3.333E-06 | global batch size:    16 | lm loss: 7.631550E+00 | loss scale: 8192.0 | grad norm: 69835.617 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      752/  159576 | consumed samples:        12032 | elapsed time per iteration (ms): 13424.2 | learning rate: 3.337E-06 | global batch size:    16 | lm loss: 7.669878E+00 | loss scale: 8192.0 | grad norm: 47821.077 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      753/  159576 | consumed samples:        12048 | elapsed time per iteration (ms): 13566.2 | learning rate: 3.342E-06 | global batch size:    16 | lm loss: 7.567214E+00 | loss scale: 8192.0 | grad norm: 68234.683 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      754/  159576 | consumed samples:        12064 | elapsed time per iteration (ms): 14065.3 | learning rate: 3.346E-06 | global batch size:    16 | lm loss: 7.753268E+00 | loss scale: 8192.0 | grad norm: 134900.848 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      755/  159576 | consumed samples:        12080 | elapsed time per iteration (ms): 13518.6 | learning rate: 3.351E-06 | global batch size:    16 | lm loss: 7.552173E+00 | loss scale: 8192.0 | grad norm: 48964.281 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      756/  159576 | consumed samples:        12096 | elapsed time per iteration (ms): 13728.7 | learning rate: 3.355E-06 | global batch size:    16 | lm loss: 7.735795E+00 | loss scale: 8192.0 | grad norm: 73204.769 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      757/  159576 | consumed samples:        12112 | elapsed time per iteration (ms): 14082.3 | learning rate: 3.359E-06 | global batch size:    16 | lm loss: 7.910018E+00 | loss scale: 8192.0 | grad norm: 83429.905 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      758/  159576 | consumed samples:        12128 | elapsed time per iteration (ms): 13428.5 | learning rate: 3.364E-06 | global batch size:    16 | lm loss: 7.669195E+00 | loss scale: 8192.0 | grad norm: 61137.847 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      759/  159576 | consumed samples:        12144 | elapsed time per iteration (ms): 13632.1 | learning rate: 3.368E-06 | global batch size:    16 | lm loss: 7.795278E+00 | loss scale: 8192.0 | grad norm: 59141.292 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      760/  159576 | consumed samples:        12160 | elapsed time per iteration (ms): 13624.6 | learning rate: 3.373E-06 | global batch size:    16 | lm loss: 7.692988E+00 | loss scale: 8192.0 | grad norm: 104447.460 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      761/  159576 | consumed samples:        12176 | elapsed time per iteration (ms): 13611.0 | learning rate: 3.377E-06 | global batch size:    16 | lm loss: 7.784515E+00 | loss scale: 8192.0 | grad norm: 51368.314 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      762/  159576 | consumed samples:        12192 | elapsed time per iteration (ms): 13558.6 | learning rate: 3.382E-06 | global batch size:    16 | lm loss: 7.582584E+00 | loss scale: 8192.0 | grad norm: 61983.639 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      763/  159576 | consumed samples:        12208 | elapsed time per iteration (ms): 13793.4 | learning rate: 3.386E-06 | global batch size:    16 | lm loss: 7.743572E+00 | loss scale: 8192.0 | grad norm: 56837.599 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      764/  159576 | consumed samples:        12224 | elapsed time per iteration (ms): 13743.7 | learning rate: 3.391E-06 | global batch size:    16 | lm loss: 7.701952E+00 | loss scale: 8192.0 | grad norm: 92476.492 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      765/  159576 | consumed samples:        12240 | elapsed time per iteration (ms): 13529.8 | learning rate: 3.395E-06 | global batch size:    16 | lm loss: 7.691103E+00 | loss scale: 8192.0 | grad norm: 103276.953 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      766/  159576 | consumed samples:        12256 | elapsed time per iteration (ms): 13189.2 | learning rate: 3.399E-06 | global batch size:    16 | lm loss: 7.589336E+00 | loss scale: 8192.0 | grad norm: 54735.017 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      767/  159576 | consumed samples:        12272 | elapsed time per iteration (ms): 13483.6 | learning rate: 3.404E-06 | global batch size:    16 | lm loss: 7.717595E+00 | loss scale: 8192.0 | grad norm: 54456.344 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      768/  159576 | consumed samples:        12288 | elapsed time per iteration (ms): 13780.9 | learning rate: 3.408E-06 | global batch size:    16 | lm loss: 7.852913E+00 | loss scale: 8192.0 | grad norm: 88912.086 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      769/  159576 | consumed samples:        12304 | elapsed time per iteration (ms): 13724.3 | learning rate: 3.413E-06 | global batch size:    16 | lm loss: 7.716819E+00 | loss scale: 8192.0 | grad norm: 102833.662 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      770/  159576 | consumed samples:        12320 | elapsed time per iteration (ms): 13377.3 | learning rate: 3.417E-06 | global batch size:    16 | lm loss: 7.597641E+00 | loss scale: 8192.0 | grad norm: 50835.662 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      771/  159576 | consumed samples:        12336 | elapsed time per iteration (ms): 13692.5 | learning rate: 3.422E-06 | global batch size:    16 | lm loss: 7.478999E+00 | loss scale: 8192.0 | grad norm: 53587.154 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      772/  159576 | consumed samples:        12352 | elapsed time per iteration (ms): 14180.5 | learning rate: 3.426E-06 | global batch size:    16 | lm loss: 7.546258E+00 | loss scale: 8192.0 | grad norm: 63294.983 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      773/  159576 | consumed samples:        12368 | elapsed time per iteration (ms): 13096.5 | learning rate: 3.430E-06 | global batch size:    16 | lm loss: 7.711743E+00 | loss scale: 8192.0 | grad norm: 99934.626 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      774/  159576 | consumed samples:        12384 | elapsed time per iteration (ms): 13520.5 | learning rate: 3.435E-06 | global batch size:    16 | lm loss: 7.645664E+00 | loss scale: 8192.0 | grad norm: 56458.777 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      775/  159576 | consumed samples:        12400 | elapsed time per iteration (ms): 13630.5 | learning rate: 3.439E-06 | global batch size:    16 | lm loss: 7.603559E+00 | loss scale: 8192.0 | grad norm: 46450.456 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      776/  159576 | consumed samples:        12416 | elapsed time per iteration (ms): 14027.6 | learning rate: 3.444E-06 | global batch size:    16 | lm loss: 7.737686E+00 | loss scale: 8192.0 | grad norm: 141770.957 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      777/  159576 | consumed samples:        12432 | elapsed time per iteration (ms): 13425.6 | learning rate: 3.448E-06 | global batch size:    16 | lm loss: 7.584914E+00 | loss scale: 8192.0 | grad norm: 124071.305 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      778/  159576 | consumed samples:        12448 | elapsed time per iteration (ms): 13642.7 | learning rate: 3.453E-06 | global batch size:    16 | lm loss: 7.606685E+00 | loss scale: 8192.0 | grad norm: 53139.139 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      779/  159576 | consumed samples:        12464 | elapsed time per iteration (ms): 13834.1 | learning rate: 3.457E-06 | global batch size:    16 | lm loss: 7.786515E+00 | loss scale: 8192.0 | grad norm: 58657.499 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      780/  159576 | consumed samples:        12480 | elapsed time per iteration (ms): 13091.5 | learning rate: 3.462E-06 | global batch size:    16 | lm loss: 7.618142E+00 | loss scale: 8192.0 | grad norm: 37881.566 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      781/  159576 | consumed samples:        12496 | elapsed time per iteration (ms): 14146.0 | learning rate: 3.466E-06 | global batch size:    16 | lm loss: 7.906812E+00 | loss scale: 8192.0 | grad norm: 114163.942 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      782/  159576 | consumed samples:        12512 | elapsed time per iteration (ms): 14025.7 | learning rate: 3.470E-06 | global batch size:    16 | lm loss: 7.566094E+00 | loss scale: 8192.0 | grad norm: 46220.333 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      783/  159576 | consumed samples:        12528 | elapsed time per iteration (ms): 13895.4 | learning rate: 3.475E-06 | global batch size:    16 | lm loss: 7.630446E+00 | loss scale: 8192.0 | grad norm: 64319.125 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      784/  159576 | consumed samples:        12544 | elapsed time per iteration (ms): 13890.1 | learning rate: 3.479E-06 | global batch size:    16 | lm loss: 7.692337E+00 | loss scale: 8192.0 | grad norm: 48575.291 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      785/  159576 | consumed samples:        12560 | elapsed time per iteration (ms): 14156.1 | learning rate: 3.484E-06 | global batch size:    16 | lm loss: 7.736514E+00 | loss scale: 8192.0 | grad norm: 90651.125 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      786/  159576 | consumed samples:        12576 | elapsed time per iteration (ms): 14206.7 | learning rate: 3.488E-06 | global batch size:    16 | lm loss: 7.744794E+00 | loss scale: 8192.0 | grad norm: 84355.344 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      787/  159576 | consumed samples:        12592 | elapsed time per iteration (ms): 13622.2 | learning rate: 3.493E-06 | global batch size:    16 | lm loss: 7.672806E+00 | loss scale: 8192.0 | grad norm: 51705.493 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      788/  159576 | consumed samples:        12608 | elapsed time per iteration (ms): 13771.2 | learning rate: 3.497E-06 | global batch size:    16 | lm loss: 7.713612E+00 | loss scale: 8192.0 | grad norm: 50748.595 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      789/  159576 | consumed samples:        12624 | elapsed time per iteration (ms): 14226.1 | learning rate: 3.501E-06 | global batch size:    16 | lm loss: 7.630927E+00 | loss scale: 8192.0 | grad norm: 68226.483 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      790/  159576 | consumed samples:        12640 | elapsed time per iteration (ms): 14175.2 | learning rate: 3.506E-06 | global batch size:    16 | lm loss: 7.523444E+00 | loss scale: 8192.0 | grad norm: 67731.569 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      791/  159576 | consumed samples:        12656 | elapsed time per iteration (ms): 13844.2 | learning rate: 3.510E-06 | global batch size:    16 | lm loss: 7.357096E+00 | loss scale: 8192.0 | grad norm: 45569.401 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      792/  159576 | consumed samples:        12672 | elapsed time per iteration (ms): 13884.3 | learning rate: 3.515E-06 | global batch size:    16 | lm loss: 7.701885E+00 | loss scale: 8192.0 | grad norm: 53017.231 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      793/  159576 | consumed samples:        12688 | elapsed time per iteration (ms): 14159.9 | learning rate: 3.519E-06 | global batch size:    16 | lm loss: 7.529918E+00 | loss scale: 8192.0 | grad norm: 55466.888 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      794/  159576 | consumed samples:        12704 | elapsed time per iteration (ms): 13975.0 | learning rate: 3.524E-06 | global batch size:    16 | lm loss: 7.684763E+00 | loss scale: 8192.0 | grad norm: 44801.760 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      795/  159576 | consumed samples:        12720 | elapsed time per iteration (ms): 13769.3 | learning rate: 3.528E-06 | global batch size:    16 | lm loss: 7.843237E+00 | loss scale: 8192.0 | grad norm: 59761.590 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      796/  159576 | consumed samples:        12736 | elapsed time per iteration (ms): 13954.1 | learning rate: 3.533E-06 | global batch size:    16 | lm loss: 7.737316E+00 | loss scale: 8192.0 | grad norm: 66240.870 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      797/  159576 | consumed samples:        12752 | elapsed time per iteration (ms): 13982.4 | learning rate: 3.537E-06 | global batch size:    16 | lm loss: 7.712746E+00 | loss scale: 8192.0 | grad norm: 53315.803 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      798/  159576 | consumed samples:        12768 | elapsed time per iteration (ms): 14164.1 | learning rate: 3.541E-06 | global batch size:    16 | lm loss: 7.649867E+00 | loss scale: 8192.0 | grad norm: 46451.967 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      799/  159576 | consumed samples:        12784 | elapsed time per iteration (ms): 14010.0 | learning rate: 3.546E-06 | global batch size:    16 | lm loss: 7.833376E+00 | loss scale: 8192.0 | grad norm: 65829.045 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      800/  159576 | consumed samples:        12800 | elapsed time per iteration (ms): 14307.9 | learning rate: 3.550E-06 | global batch size:    16 | lm loss: 7.790625E+00 | loss scale: 8192.0 | grad norm: 71968.262 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      801/  159576 | consumed samples:        12816 | elapsed time per iteration (ms): 13972.6 | learning rate: 3.555E-06 | global batch size:    16 | lm loss: 7.611866E+00 | loss scale: 8192.0 | grad norm: 48597.309 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      802/  159576 | consumed samples:        12832 | elapsed time per iteration (ms): 13959.0 | learning rate: 3.559E-06 | global batch size:    16 | lm loss: 7.617666E+00 | loss scale: 8192.0 | grad norm: 147672.383 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      803/  159576 | consumed samples:        12848 | elapsed time per iteration (ms): 13806.4 | learning rate: 3.564E-06 | global batch size:    16 | lm loss: 7.813154E+00 | loss scale: 8192.0 | grad norm: 121980.871 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      804/  159576 | consumed samples:        12864 | elapsed time per iteration (ms): 13949.2 | learning rate: 3.568E-06 | global batch size:    16 | lm loss: 7.654176E+00 | loss scale: 8192.0 | grad norm: 52351.960 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      805/  159576 | consumed samples:        12880 | elapsed time per iteration (ms): 13801.9 | learning rate: 3.572E-06 | global batch size:    16 | lm loss: 7.564305E+00 | loss scale: 8192.0 | grad norm: 62792.545 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      806/  159576 | consumed samples:        12896 | elapsed time per iteration (ms): 13954.3 | learning rate: 3.577E-06 | global batch size:    16 | lm loss: 7.707185E+00 | loss scale: 8192.0 | grad norm: 64767.398 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      807/  159576 | consumed samples:        12912 | elapsed time per iteration (ms): 14250.4 | learning rate: 3.581E-06 | global batch size:    16 | lm loss: 7.578569E+00 | loss scale: 8192.0 | grad norm: 73926.917 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      808/  159576 | consumed samples:        12928 | elapsed time per iteration (ms): 14201.0 | learning rate: 3.586E-06 | global batch size:    16 | lm loss: 7.631069E+00 | loss scale: 8192.0 | grad norm: 110069.754 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      809/  159576 | consumed samples:        12944 | elapsed time per iteration (ms): 13598.4 | learning rate: 3.590E-06 | global batch size:    16 | lm loss: 7.628491E+00 | loss scale: 8192.0 | grad norm: 49670.988 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      810/  159576 | consumed samples:        12960 | elapsed time per iteration (ms): 13941.6 | learning rate: 3.595E-06 | global batch size:    16 | lm loss: 7.759563E+00 | loss scale: 8192.0 | grad norm: 45971.027 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      811/  159576 | consumed samples:        12976 | elapsed time per iteration (ms): 14298.0 | learning rate: 3.599E-06 | global batch size:    16 | lm loss: 7.502759E+00 | loss scale: 8192.0 | grad norm: 77602.902 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      812/  159576 | consumed samples:        12992 | elapsed time per iteration (ms): 13416.1 | learning rate: 3.604E-06 | global batch size:    16 | lm loss: 7.624804E+00 | loss scale: 8192.0 | grad norm: 95989.772 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      813/  159576 | consumed samples:        13008 | elapsed time per iteration (ms): 13579.1 | learning rate: 3.608E-06 | global batch size:    16 | lm loss: 7.542982E+00 | loss scale: 8192.0 | grad norm: 52064.554 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      814/  159576 | consumed samples:        13024 | elapsed time per iteration (ms): 14100.2 | learning rate: 3.612E-06 | global batch size:    16 | lm loss: 7.676429E+00 | loss scale: 8192.0 | grad norm: 38221.569 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      815/  159576 | consumed samples:        13040 | elapsed time per iteration (ms): 14346.2 | learning rate: 3.617E-06 | global batch size:    16 | lm loss: 7.695131E+00 | loss scale: 8192.0 | grad norm: 57869.513 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      816/  159576 | consumed samples:        13056 | elapsed time per iteration (ms): 13771.7 | learning rate: 3.621E-06 | global batch size:    16 | lm loss: 7.578337E+00 | loss scale: 8192.0 | grad norm: 49771.695 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      817/  159576 | consumed samples:        13072 | elapsed time per iteration (ms): 13776.0 | learning rate: 3.626E-06 | global batch size:    16 | lm loss: 7.583301E+00 | loss scale: 8192.0 | grad norm: 46160.592 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      818/  159576 | consumed samples:        13088 | elapsed time per iteration (ms): 14040.8 | learning rate: 3.630E-06 | global batch size:    16 | lm loss: 7.773385E+00 | loss scale: 8192.0 | grad norm: 42207.098 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      819/  159576 | consumed samples:        13104 | elapsed time per iteration (ms): 13835.3 | learning rate: 3.635E-06 | global batch size:    16 | lm loss: 7.905573E+00 | loss scale: 8192.0 | grad norm: 111883.611 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      820/  159576 | consumed samples:        13120 | elapsed time per iteration (ms): 13924.4 | learning rate: 3.639E-06 | global batch size:    16 | lm loss: 7.730550E+00 | loss scale: 8192.0 | grad norm: 75433.173 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      821/  159576 | consumed samples:        13136 | elapsed time per iteration (ms): 13915.0 | learning rate: 3.643E-06 | global batch size:    16 | lm loss: 7.688564E+00 | loss scale: 8192.0 | grad norm: 41927.693 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      822/  159576 | consumed samples:        13152 | elapsed time per iteration (ms): 13890.4 | learning rate: 3.648E-06 | global batch size:    16 | lm loss: 7.552343E+00 | loss scale: 8192.0 | grad norm: 96543.909 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      823/  159576 | consumed samples:        13168 | elapsed time per iteration (ms): 13560.6 | learning rate: 3.652E-06 | global batch size:    16 | lm loss: 7.617982E+00 | loss scale: 8192.0 | grad norm: 56370.152 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      824/  159576 | consumed samples:        13184 | elapsed time per iteration (ms): 14024.1 | learning rate: 3.657E-06 | global batch size:    16 | lm loss: 7.600199E+00 | loss scale: 8192.0 | grad norm: 61928.907 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      825/  159576 | consumed samples:        13200 | elapsed time per iteration (ms): 14003.2 | learning rate: 3.661E-06 | global batch size:    16 | lm loss: 7.541789E+00 | loss scale: 8192.0 | grad norm: 56863.341 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      826/  159576 | consumed samples:        13216 | elapsed time per iteration (ms): 13848.3 | learning rate: 3.666E-06 | global batch size:    16 | lm loss: 7.782004E+00 | loss scale: 8192.0 | grad norm: 59985.533 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      827/  159576 | consumed samples:        13232 | elapsed time per iteration (ms): 13902.1 | learning rate: 3.670E-06 | global batch size:    16 | lm loss: 7.733065E+00 | loss scale: 8192.0 | grad norm: 39148.960 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      828/  159576 | consumed samples:        13248 | elapsed time per iteration (ms): 14356.1 | learning rate: 3.675E-06 | global batch size:    16 | lm loss: 7.625387E+00 | loss scale: 8192.0 | grad norm: 56612.459 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      829/  159576 | consumed samples:        13264 | elapsed time per iteration (ms): 14368.0 | learning rate: 3.679E-06 | global batch size:    16 | lm loss: 7.759684E+00 | loss scale: 8192.0 | grad norm: 67635.907 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      830/  159576 | consumed samples:        13280 | elapsed time per iteration (ms): 13627.9 | learning rate: 3.683E-06 | global batch size:    16 | lm loss: 7.694915E+00 | loss scale: 8192.0 | grad norm: 60776.045 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      831/  159576 | consumed samples:        13296 | elapsed time per iteration (ms): 13498.1 | learning rate: 3.688E-06 | global batch size:    16 | lm loss: 7.492978E+00 | loss scale: 8192.0 | grad norm: 42000.715 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      832/  159576 | consumed samples:        13312 | elapsed time per iteration (ms): 13938.9 | learning rate: 3.692E-06 | global batch size:    16 | lm loss: 7.616700E+00 | loss scale: 8192.0 | grad norm: 105579.700 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      833/  159576 | consumed samples:        13328 | elapsed time per iteration (ms): 13687.8 | learning rate: 3.697E-06 | global batch size:    16 | lm loss: 7.715961E+00 | loss scale: 8192.0 | grad norm: 78119.339 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      834/  159576 | consumed samples:        13344 | elapsed time per iteration (ms): 13717.8 | learning rate: 3.701E-06 | global batch size:    16 | lm loss: 7.778497E+00 | loss scale: 8192.0 | grad norm: 58326.728 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      835/  159576 | consumed samples:        13360 | elapsed time per iteration (ms): 13913.9 | learning rate: 3.706E-06 | global batch size:    16 | lm loss: 7.718093E+00 | loss scale: 8192.0 | grad norm: 48122.513 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      836/  159576 | consumed samples:        13376 | elapsed time per iteration (ms): 14318.5 | learning rate: 3.710E-06 | global batch size:    16 | lm loss: 7.521303E+00 | loss scale: 8192.0 | grad norm: 60082.150 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      837/  159576 | consumed samples:        13392 | elapsed time per iteration (ms): 13780.0 | learning rate: 3.714E-06 | global batch size:    16 | lm loss: 7.538383E+00 | loss scale: 8192.0 | grad norm: 61043.143 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      838/  159576 | consumed samples:        13408 | elapsed time per iteration (ms): 13961.2 | learning rate: 3.719E-06 | global batch size:    16 | lm loss: 7.548276E+00 | loss scale: 8192.0 | grad norm: 58423.396 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      839/  159576 | consumed samples:        13424 | elapsed time per iteration (ms): 14239.6 | learning rate: 3.723E-06 | global batch size:    16 | lm loss: 7.618182E+00 | loss scale: 8192.0 | grad norm: 48500.077 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      840/  159576 | consumed samples:        13440 | elapsed time per iteration (ms): 13752.3 | learning rate: 3.728E-06 | global batch size:    16 | lm loss: 7.595082E+00 | loss scale: 8192.0 | grad norm: 50825.625 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      841/  159576 | consumed samples:        13456 | elapsed time per iteration (ms): 14199.3 | learning rate: 3.732E-06 | global batch size:    16 | lm loss: 7.492725E+00 | loss scale: 8192.0 | grad norm: 56977.964 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      842/  159576 | consumed samples:        13472 | elapsed time per iteration (ms): 13925.4 | learning rate: 3.737E-06 | global batch size:    16 | lm loss: 7.783816E+00 | loss scale: 8192.0 | grad norm: 40797.888 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      843/  159576 | consumed samples:        13488 | elapsed time per iteration (ms): 14119.4 | learning rate: 3.741E-06 | global batch size:    16 | lm loss: 7.606951E+00 | loss scale: 8192.0 | grad norm: 50890.553 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      844/  159576 | consumed samples:        13504 | elapsed time per iteration (ms): 13941.8 | learning rate: 3.746E-06 | global batch size:    16 | lm loss: 7.638199E+00 | loss scale: 8192.0 | grad norm: 52652.311 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      845/  159576 | consumed samples:        13520 | elapsed time per iteration (ms): 14424.1 | learning rate: 3.750E-06 | global batch size:    16 | lm loss: 7.555171E+00 | loss scale: 8192.0 | grad norm: 48298.607 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      846/  159576 | consumed samples:        13536 | elapsed time per iteration (ms): 14202.9 | learning rate: 3.754E-06 | global batch size:    16 | lm loss: 7.651504E+00 | loss scale: 8192.0 | grad norm: 76618.386 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      847/  159576 | consumed samples:        13552 | elapsed time per iteration (ms): 13785.9 | learning rate: 3.759E-06 | global batch size:    16 | lm loss: 7.914087E+00 | loss scale: 8192.0 | grad norm: 40970.022 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      848/  159576 | consumed samples:        13568 | elapsed time per iteration (ms): 13892.7 | learning rate: 3.763E-06 | global batch size:    16 | lm loss: 7.714731E+00 | loss scale: 8192.0 | grad norm: 47666.946 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      849/  159576 | consumed samples:        13584 | elapsed time per iteration (ms): 13608.6 | learning rate: 3.768E-06 | global batch size:    16 | lm loss: 7.566309E+00 | loss scale: 8192.0 | grad norm: 56337.203 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      850/  159576 | consumed samples:        13600 | elapsed time per iteration (ms): 13752.1 | learning rate: 3.772E-06 | global batch size:    16 | lm loss: 7.621016E+00 | loss scale: 8192.0 | grad norm: 55695.680 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      851/  159576 | consumed samples:        13616 | elapsed time per iteration (ms): 13514.6 | learning rate: 3.777E-06 | global batch size:    16 | lm loss: 7.510153E+00 | loss scale: 8192.0 | grad norm: 70852.784 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      852/  159576 | consumed samples:        13632 | elapsed time per iteration (ms): 13536.1 | learning rate: 3.781E-06 | global batch size:    16 | lm loss: 7.417966E+00 | loss scale: 8192.0 | grad norm: 43169.299 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      853/  159576 | consumed samples:        13648 | elapsed time per iteration (ms): 14116.4 | learning rate: 3.786E-06 | global batch size:    16 | lm loss: 7.490001E+00 | loss scale: 8192.0 | grad norm: 61980.012 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      854/  159576 | consumed samples:        13664 | elapsed time per iteration (ms): 14372.8 | learning rate: 3.790E-06 | global batch size:    16 | lm loss: 7.555287E+00 | loss scale: 8192.0 | grad norm: 43650.333 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      855/  159576 | consumed samples:        13680 | elapsed time per iteration (ms): 13154.5 | learning rate: 3.794E-06 | global batch size:    16 | lm loss: 7.628311E+00 | loss scale: 8192.0 | grad norm: 32290.729 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      856/  159576 | consumed samples:        13696 | elapsed time per iteration (ms): 13509.6 | learning rate: 3.799E-06 | global batch size:    16 | lm loss: 7.757495E+00 | loss scale: 8192.0 | grad norm: 94063.051 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      857/  159576 | consumed samples:        13712 | elapsed time per iteration (ms): 14015.7 | learning rate: 3.803E-06 | global batch size:    16 | lm loss: 7.733263E+00 | loss scale: 8192.0 | grad norm: 53189.090 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      858/  159576 | consumed samples:        13728 | elapsed time per iteration (ms): 14357.8 | learning rate: 3.808E-06 | global batch size:    16 | lm loss: 7.570580E+00 | loss scale: 8192.0 | grad norm: 57239.238 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      859/  159576 | consumed samples:        13744 | elapsed time per iteration (ms): 13954.6 | learning rate: 3.812E-06 | global batch size:    16 | lm loss: 7.593122E+00 | loss scale: 8192.0 | grad norm: 45414.199 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      860/  159576 | consumed samples:        13760 | elapsed time per iteration (ms): 14212.3 | learning rate: 3.817E-06 | global batch size:    16 | lm loss: 7.571471E+00 | loss scale: 8192.0 | grad norm: 75659.476 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      861/  159576 | consumed samples:        13776 | elapsed time per iteration (ms): 14044.0 | learning rate: 3.821E-06 | global batch size:    16 | lm loss: 7.599829E+00 | loss scale: 8192.0 | grad norm: 47651.114 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      862/  159576 | consumed samples:        13792 | elapsed time per iteration (ms): 13529.5 | learning rate: 3.825E-06 | global batch size:    16 | lm loss: 7.427186E+00 | loss scale: 8192.0 | grad norm: 76377.661 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      863/  159576 | consumed samples:        13808 | elapsed time per iteration (ms): 14057.3 | learning rate: 3.830E-06 | global batch size:    16 | lm loss: 7.736305E+00 | loss scale: 8192.0 | grad norm: 76320.820 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      864/  159576 | consumed samples:        13824 | elapsed time per iteration (ms): 14064.2 | learning rate: 3.834E-06 | global batch size:    16 | lm loss: 7.637553E+00 | loss scale: 8192.0 | grad norm: 56695.795 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      865/  159576 | consumed samples:        13840 | elapsed time per iteration (ms): 14009.0 | learning rate: 3.839E-06 | global batch size:    16 | lm loss: 7.709378E+00 | loss scale: 8192.0 | grad norm: 77647.024 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      866/  159576 | consumed samples:        13856 | elapsed time per iteration (ms): 13951.3 | learning rate: 3.843E-06 | global batch size:    16 | lm loss: 7.856131E+00 | loss scale: 8192.0 | grad norm: 85925.999 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      867/  159576 | consumed samples:        13872 | elapsed time per iteration (ms): 14427.4 | learning rate: 3.848E-06 | global batch size:    16 | lm loss: 7.511599E+00 | loss scale: 8192.0 | grad norm: 50353.044 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      868/  159576 | consumed samples:        13888 | elapsed time per iteration (ms): 14117.9 | learning rate: 3.852E-06 | global batch size:    16 | lm loss: 7.803133E+00 | loss scale: 8192.0 | grad norm: 73334.122 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      869/  159576 | consumed samples:        13904 | elapsed time per iteration (ms): 13519.9 | learning rate: 3.857E-06 | global batch size:    16 | lm loss: 7.515793E+00 | loss scale: 8192.0 | grad norm: 73466.425 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      870/  159576 | consumed samples:        13920 | elapsed time per iteration (ms): 13901.3 | learning rate: 3.861E-06 | global batch size:    16 | lm loss: 7.841221E+00 | loss scale: 8192.0 | grad norm: 74455.188 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      871/  159576 | consumed samples:        13936 | elapsed time per iteration (ms): 14383.8 | learning rate: 3.865E-06 | global batch size:    16 | lm loss: 7.850037E+00 | loss scale: 8192.0 | grad norm: 49579.751 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      872/  159576 | consumed samples:        13952 | elapsed time per iteration (ms): 14031.3 | learning rate: 3.870E-06 | global batch size:    16 | lm loss: 7.490081E+00 | loss scale: 8192.0 | grad norm: 71074.482 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      873/  159576 | consumed samples:        13968 | elapsed time per iteration (ms): 13971.5 | learning rate: 3.874E-06 | global batch size:    16 | lm loss: 7.783985E+00 | loss scale: 8192.0 | grad norm: 102193.504 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      874/  159576 | consumed samples:        13984 | elapsed time per iteration (ms): 14176.3 | learning rate: 3.879E-06 | global batch size:    16 | lm loss: 7.557288E+00 | loss scale: 8192.0 | grad norm: 71546.244 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      875/  159576 | consumed samples:        14000 | elapsed time per iteration (ms): 14495.9 | learning rate: 3.883E-06 | global batch size:    16 | lm loss: 7.703010E+00 | loss scale: 8192.0 | grad norm: 50279.497 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      876/  159576 | consumed samples:        14016 | elapsed time per iteration (ms): 13722.6 | learning rate: 3.888E-06 | global batch size:    16 | lm loss: 7.542592E+00 | loss scale: 8192.0 | grad norm: 44841.536 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      877/  159576 | consumed samples:        14032 | elapsed time per iteration (ms): 13946.5 | learning rate: 3.892E-06 | global batch size:    16 | lm loss: 7.776785E+00 | loss scale: 8192.0 | grad norm: 109756.647 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      878/  159576 | consumed samples:        14048 | elapsed time per iteration (ms): 13948.7 | learning rate: 3.896E-06 | global batch size:    16 | lm loss: 7.728590E+00 | loss scale: 8192.0 | grad norm: 70820.820 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      879/  159576 | consumed samples:        14064 | elapsed time per iteration (ms): 13882.9 | learning rate: 3.901E-06 | global batch size:    16 | lm loss: 7.672616E+00 | loss scale: 8192.0 | grad norm: 44570.920 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      880/  159576 | consumed samples:        14080 | elapsed time per iteration (ms): 14042.4 | learning rate: 3.905E-06 | global batch size:    16 | lm loss: 7.680589E+00 | loss scale: 8192.0 | grad norm: 124008.380 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      881/  159576 | consumed samples:        14096 | elapsed time per iteration (ms): 13930.7 | learning rate: 3.910E-06 | global batch size:    16 | lm loss: 7.501089E+00 | loss scale: 8192.0 | grad norm: 46056.517 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      882/  159576 | consumed samples:        14112 | elapsed time per iteration (ms): 14239.7 | learning rate: 3.914E-06 | global batch size:    16 | lm loss: 7.571886E+00 | loss scale: 8192.0 | grad norm: 66612.529 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      883/  159576 | consumed samples:        14128 | elapsed time per iteration (ms): 13486.8 | learning rate: 3.919E-06 | global batch size:    16 | lm loss: 7.536567E+00 | loss scale: 8192.0 | grad norm: 62829.154 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      884/  159576 | consumed samples:        14144 | elapsed time per iteration (ms): 14209.0 | learning rate: 3.923E-06 | global batch size:    16 | lm loss: 7.794725E+00 | loss scale: 8192.0 | grad norm: 67729.342 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      885/  159576 | consumed samples:        14160 | elapsed time per iteration (ms): 13720.4 | learning rate: 3.928E-06 | global batch size:    16 | lm loss: 7.468060E+00 | loss scale: 8192.0 | grad norm: 44457.501 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      886/  159576 | consumed samples:        14176 | elapsed time per iteration (ms): 13867.7 | learning rate: 3.932E-06 | global batch size:    16 | lm loss: 7.478938E+00 | loss scale: 8192.0 | grad norm: 45629.682 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      887/  159576 | consumed samples:        14192 | elapsed time per iteration (ms): 13805.2 | learning rate: 3.936E-06 | global batch size:    16 | lm loss: 7.427522E+00 | loss scale: 8192.0 | grad norm: 59355.003 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      888/  159576 | consumed samples:        14208 | elapsed time per iteration (ms): 14520.3 | learning rate: 3.941E-06 | global batch size:    16 | lm loss: 7.602240E+00 | loss scale: 8192.0 | grad norm: 45450.350 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      889/  159576 | consumed samples:        14224 | elapsed time per iteration (ms): 13870.2 | learning rate: 3.945E-06 | global batch size:    16 | lm loss: 7.682034E+00 | loss scale: 8192.0 | grad norm: 51153.138 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      890/  159576 | consumed samples:        14240 | elapsed time per iteration (ms): 13708.4 | learning rate: 3.950E-06 | global batch size:    16 | lm loss: 7.558862E+00 | loss scale: 8192.0 | grad norm: 46389.657 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      891/  159576 | consumed samples:        14256 | elapsed time per iteration (ms): 13645.4 | learning rate: 3.954E-06 | global batch size:    16 | lm loss: 7.527663E+00 | loss scale: 8192.0 | grad norm: 86582.230 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      892/  159576 | consumed samples:        14272 | elapsed time per iteration (ms): 13652.2 | learning rate: 3.959E-06 | global batch size:    16 | lm loss: 7.675562E+00 | loss scale: 8192.0 | grad norm: 68924.015 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      893/  159576 | consumed samples:        14288 | elapsed time per iteration (ms): 14020.9 | learning rate: 3.963E-06 | global batch size:    16 | lm loss: 7.534761E+00 | loss scale: 8192.0 | grad norm: 47359.573 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      894/  159576 | consumed samples:        14304 | elapsed time per iteration (ms): 13841.4 | learning rate: 3.967E-06 | global batch size:    16 | lm loss: 7.447322E+00 | loss scale: 8192.0 | grad norm: 51692.050 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      895/  159576 | consumed samples:        14320 | elapsed time per iteration (ms): 14037.6 | learning rate: 3.972E-06 | global batch size:    16 | lm loss: 7.507210E+00 | loss scale: 8192.0 | grad norm: 64045.210 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      896/  159576 | consumed samples:        14336 | elapsed time per iteration (ms): 14109.9 | learning rate: 3.976E-06 | global batch size:    16 | lm loss: 7.523023E+00 | loss scale: 8192.0 | grad norm: 62130.023 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      897/  159576 | consumed samples:        14352 | elapsed time per iteration (ms): 14567.0 | learning rate: 3.981E-06 | global batch size:    16 | lm loss: 7.609581E+00 | loss scale: 8192.0 | grad norm: 45111.563 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      898/  159576 | consumed samples:        14368 | elapsed time per iteration (ms): 13613.4 | learning rate: 3.985E-06 | global batch size:    16 | lm loss: 7.677504E+00 | loss scale: 8192.0 | grad norm: 77037.256 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      899/  159576 | consumed samples:        14384 | elapsed time per iteration (ms): 13889.7 | learning rate: 3.990E-06 | global batch size:    16 | lm loss: 7.463535E+00 | loss scale: 8192.0 | grad norm: 63218.567 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      900/  159576 | consumed samples:        14400 | elapsed time per iteration (ms): 13953.1 | learning rate: 3.994E-06 | global batch size:    16 | lm loss: 7.512316E+00 | loss scale: 8192.0 | grad norm: 45889.461 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      901/  159576 | consumed samples:        14416 | elapsed time per iteration (ms): 14162.8 | learning rate: 3.999E-06 | global batch size:    16 | lm loss: 7.882708E+00 | loss scale: 8192.0 | grad norm: 42823.467 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      902/  159576 | consumed samples:        14432 | elapsed time per iteration (ms): 13923.6 | learning rate: 4.003E-06 | global batch size:    16 | lm loss: 7.662213E+00 | loss scale: 8192.0 | grad norm: 61513.464 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      903/  159576 | consumed samples:        14448 | elapsed time per iteration (ms): 14309.5 | learning rate: 4.007E-06 | global batch size:    16 | lm loss: 7.560106E+00 | loss scale: 8192.0 | grad norm: 69145.911 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      904/  159576 | consumed samples:        14464 | elapsed time per iteration (ms): 13872.6 | learning rate: 4.012E-06 | global batch size:    16 | lm loss: 7.580536E+00 | loss scale: 8192.0 | grad norm: 50555.734 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      905/  159576 | consumed samples:        14480 | elapsed time per iteration (ms): 13660.1 | learning rate: 4.016E-06 | global batch size:    16 | lm loss: 7.370582E+00 | loss scale: 8192.0 | grad norm: 58747.890 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      906/  159576 | consumed samples:        14496 | elapsed time per iteration (ms): 14302.6 | learning rate: 4.021E-06 | global batch size:    16 | lm loss: 7.578561E+00 | loss scale: 8192.0 | grad norm: 51271.016 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      907/  159576 | consumed samples:        14512 | elapsed time per iteration (ms): 13761.7 | learning rate: 4.025E-06 | global batch size:    16 | lm loss: 7.886317E+00 | loss scale: 8192.0 | grad norm: 103662.947 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      908/  159576 | consumed samples:        14528 | elapsed time per iteration (ms): 13804.9 | learning rate: 4.030E-06 | global batch size:    16 | lm loss: 7.671743E+00 | loss scale: 8192.0 | grad norm: 73682.928 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      909/  159576 | consumed samples:        14544 | elapsed time per iteration (ms): 13551.5 | learning rate: 4.034E-06 | global batch size:    16 | lm loss: 7.644366E+00 | loss scale: 8192.0 | grad norm: 44749.062 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      910/  159576 | consumed samples:        14560 | elapsed time per iteration (ms): 14145.8 | learning rate: 4.038E-06 | global batch size:    16 | lm loss: 7.575992E+00 | loss scale: 8192.0 | grad norm: 123440.918 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      911/  159576 | consumed samples:        14576 | elapsed time per iteration (ms): 13697.4 | learning rate: 4.043E-06 | global batch size:    16 | lm loss: 7.622074E+00 | loss scale: 8192.0 | grad norm: 106507.983 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      912/  159576 | consumed samples:        14592 | elapsed time per iteration (ms): 13234.0 | learning rate: 4.047E-06 | global batch size:    16 | lm loss: 7.362756E+00 | loss scale: 8192.0 | grad norm: 47407.480 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      913/  159576 | consumed samples:        14608 | elapsed time per iteration (ms): 13588.2 | learning rate: 4.052E-06 | global batch size:    16 | lm loss: 7.463619E+00 | loss scale: 8192.0 | grad norm: 52603.656 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      914/  159576 | consumed samples:        14624 | elapsed time per iteration (ms): 13866.4 | learning rate: 4.056E-06 | global batch size:    16 | lm loss: 7.559254E+00 | loss scale: 8192.0 | grad norm: 75070.449 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      915/  159576 | consumed samples:        14640 | elapsed time per iteration (ms): 13445.5 | learning rate: 4.061E-06 | global batch size:    16 | lm loss: 7.466935E+00 | loss scale: 8192.0 | grad norm: 84703.653 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      916/  159576 | consumed samples:        14656 | elapsed time per iteration (ms): 13592.3 | learning rate: 4.065E-06 | global batch size:    16 | lm loss: 7.530110E+00 | loss scale: 8192.0 | grad norm: 68897.329 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      917/  159576 | consumed samples:        14672 | elapsed time per iteration (ms): 13623.0 | learning rate: 4.070E-06 | global batch size:    16 | lm loss: 7.709665E+00 | loss scale: 8192.0 | grad norm: 42674.546 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      918/  159576 | consumed samples:        14688 | elapsed time per iteration (ms): 13933.4 | learning rate: 4.074E-06 | global batch size:    16 | lm loss: 7.340624E+00 | loss scale: 8192.0 | grad norm: 62308.866 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      919/  159576 | consumed samples:        14704 | elapsed time per iteration (ms): 13383.8 | learning rate: 4.078E-06 | global batch size:    16 | lm loss: 7.633225E+00 | loss scale: 8192.0 | grad norm: 101681.252 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      920/  159576 | consumed samples:        14720 | elapsed time per iteration (ms): 13577.7 | learning rate: 4.083E-06 | global batch size:    16 | lm loss: 7.753546E+00 | loss scale: 8192.0 | grad norm: 64758.234 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      921/  159576 | consumed samples:        14736 | elapsed time per iteration (ms): 13615.2 | learning rate: 4.087E-06 | global batch size:    16 | lm loss: 7.587958E+00 | loss scale: 8192.0 | grad norm: 50894.580 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      922/  159576 | consumed samples:        14752 | elapsed time per iteration (ms): 13349.8 | learning rate: 4.092E-06 | global batch size:    16 | lm loss: 7.769899E+00 | loss scale: 8192.0 | grad norm: 142837.991 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      923/  159576 | consumed samples:        14768 | elapsed time per iteration (ms): 13909.6 | learning rate: 4.096E-06 | global batch size:    16 | lm loss: 7.624977E+00 | loss scale: 8192.0 | grad norm: 83848.961 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      924/  159576 | consumed samples:        14784 | elapsed time per iteration (ms): 13544.9 | learning rate: 4.101E-06 | global batch size:    16 | lm loss: 7.603238E+00 | loss scale: 8192.0 | grad norm: 56820.812 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      925/  159576 | consumed samples:        14800 | elapsed time per iteration (ms): 14229.7 | learning rate: 4.105E-06 | global batch size:    16 | lm loss: 7.706733E+00 | loss scale: 8192.0 | grad norm: 76791.134 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      926/  159576 | consumed samples:        14816 | elapsed time per iteration (ms): 13216.1 | learning rate: 4.109E-06 | global batch size:    16 | lm loss: 7.619715E+00 | loss scale: 8192.0 | grad norm: 71541.361 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      927/  159576 | consumed samples:        14832 | elapsed time per iteration (ms): 13878.1 | learning rate: 4.114E-06 | global batch size:    16 | lm loss: 7.712871E+00 | loss scale: 8192.0 | grad norm: 73909.646 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      928/  159576 | consumed samples:        14848 | elapsed time per iteration (ms): 13952.8 | learning rate: 4.118E-06 | global batch size:    16 | lm loss: 7.413386E+00 | loss scale: 8192.0 | grad norm: 57651.288 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      929/  159576 | consumed samples:        14864 | elapsed time per iteration (ms): 13472.5 | learning rate: 4.123E-06 | global batch size:    16 | lm loss: 7.559020E+00 | loss scale: 8192.0 | grad norm: 91128.588 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      930/  159576 | consumed samples:        14880 | elapsed time per iteration (ms): 13393.9 | learning rate: 4.127E-06 | global batch size:    16 | lm loss: 7.636448E+00 | loss scale: 8192.0 | grad norm: 48957.093 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      931/  159576 | consumed samples:        14896 | elapsed time per iteration (ms): 13547.0 | learning rate: 4.132E-06 | global batch size:    16 | lm loss: 7.639730E+00 | loss scale: 8192.0 | grad norm: 110788.722 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      932/  159576 | consumed samples:        14912 | elapsed time per iteration (ms): 14018.3 | learning rate: 4.136E-06 | global batch size:    16 | lm loss: 7.652531E+00 | loss scale: 8192.0 | grad norm: 96359.374 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      933/  159576 | consumed samples:        14928 | elapsed time per iteration (ms): 13449.4 | learning rate: 4.141E-06 | global batch size:    16 | lm loss: 7.671719E+00 | loss scale: 8192.0 | grad norm: 60936.312 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      934/  159576 | consumed samples:        14944 | elapsed time per iteration (ms): 13624.9 | learning rate: 4.145E-06 | global batch size:    16 | lm loss: 7.672961E+00 | loss scale: 8192.0 | grad norm: 45848.114 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      935/  159576 | consumed samples:        14960 | elapsed time per iteration (ms): 13787.5 | learning rate: 4.149E-06 | global batch size:    16 | lm loss: 7.740889E+00 | loss scale: 8192.0 | grad norm: 140359.981 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      936/  159576 | consumed samples:        14976 | elapsed time per iteration (ms): 13643.3 | learning rate: 4.154E-06 | global batch size:    16 | lm loss: 7.595088E+00 | loss scale: 8192.0 | grad norm: 125926.574 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      937/  159576 | consumed samples:        14992 | elapsed time per iteration (ms): 13588.2 | learning rate: 4.158E-06 | global batch size:    16 | lm loss: 7.580822E+00 | loss scale: 8192.0 | grad norm: 88915.383 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      938/  159576 | consumed samples:        15008 | elapsed time per iteration (ms): 13606.3 | learning rate: 4.163E-06 | global batch size:    16 | lm loss: 7.766950E+00 | loss scale: 8192.0 | grad norm: 88671.645 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      939/  159576 | consumed samples:        15024 | elapsed time per iteration (ms): 13894.4 | learning rate: 4.167E-06 | global batch size:    16 | lm loss: 7.578055E+00 | loss scale: 8192.0 | grad norm: 66434.885 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      940/  159576 | consumed samples:        15040 | elapsed time per iteration (ms): 13885.0 | learning rate: 4.172E-06 | global batch size:    16 | lm loss: 7.837738E+00 | loss scale: 8192.0 | grad norm: 64490.261 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      941/  159576 | consumed samples:        15056 | elapsed time per iteration (ms): 14127.9 | learning rate: 4.176E-06 | global batch size:    16 | lm loss: 7.961911E+00 | loss scale: 8192.0 | grad norm: 155493.780 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      942/  159576 | consumed samples:        15072 | elapsed time per iteration (ms): 14120.5 | learning rate: 4.180E-06 | global batch size:    16 | lm loss: 7.581886E+00 | loss scale: 8192.0 | grad norm: 84829.182 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
saving checkpoint at iteration     942 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints
[2021-09-24 05:51:49,558] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/global_step942/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration     942 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints
time (ms) | save-checkpoint: 17459.68
[exiting program after 110.12040019432703 minutes] datetime: 2021-09-24 05:52:01 
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0mninja  .........................  [92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------fused_lamb
 .............op name  [93m[NO][0m................  .......installed  [92m[OKAY][0m..
 compatible
--------------------------------------------------
cpu_adamsparse_attn  ...........................  [92m[YES][0m[93m[NO][0m  ...... .......[92m[OKAY][0m 
[92m[OKAY][0m
transformer ............ [93m[NO][0m .......fused_adam  [92m[OKAY][0m.............
 [93m[NO][0m ....... stochastic_transformer[92m[OKAY][0m 
. fused_lamb[93m[NO][0m  ....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utils ..................utils  [92m[YES][0m..................  ......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
quantizer ..............quantizer  [93m[NO][0m..............  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
ninja .................. [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
op name ................ installed .. compatible
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------transformer 
............ DeepSpeed C++/CUDA extension op report[93m[NO][0m
 .......-------------------------------------------------- 
[92m[OKAY][0mNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------
JIT compiled ops requires ninjastochastic_transformer
 . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:DeepSpeed general environment info:

torch install path torch install path...............  ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda version torch cuda version...............  ...............11.1 
11.1nvcc version
 nvcc version.....................  .....................11.2 
11.2deepspeed install path
 deepspeed install path...........  ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info
 deepspeed info...................  ...................0.4.2+bc17042, bc17042, big-science 
0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w.
 deepspeed wheel compiled w.......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
DeepSpeed general environment info:torch cuda version ............... 
11.1
nvcc version .....................torch install path 11.2 
...............deepspeed install path  ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']deepspeed info
 ................... torch version0.4.2+bc17042, bc17042, big-science 
....................deepspeed wheel compiled w.  1.8.1......
 torch 1.8, cuda 11.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninjaJIT compiled ops requires ninja

ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninjaninja  ....................................  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

op nameop name  ................................  installedinstalled  ....  compatiblecompatible

----------------------------------------------------------------------------------------------------

cpu_adamcpu_adam  ..............................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

fused_adamfused_adam  .......................... [93m[NO][0m  [93m[NO][0m.......  ....... [92m[OKAY][0m[92m[OKAY][0m

fused_lamb .............fused_lamb  [93m[NO][0m.............  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
sparse_attnsparse_attn  ........................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

transformertransformer  ........................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

stochastic_transformer stochastic_transformer . [93m[NO][0m.  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
fused_adam op name.............  ................[93m[NO][0m  installed.......  ..[92m[OKAY][0m 
compatible
--------------------------------------------------
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m .......fused_adam  [92m[OKAY][0m.............
 [93m[NO][0m .......transformer  [92m[OKAY][0m............
 [93m[NO][0m .......fused_lamb  [92m[OKAY][0m.............
 [93m[NO][0m ....... [92m[OKAY][0mstochastic_transformer
 . [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------sparse_attn ............ 
[93m[NO][0m DeepSpeed C++/CUDA extension op report.......
 [92m[OKAY][0m--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.transformer
 ............-------------------------------------------------- 
[93m[NO][0mJIT compiled ops requires ninja 
....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
ninja .................. [92m[OKAY][0m
cpu_adam-------------------------------------------------- 
............... op name[92m[YES][0m  ................ ......installed  ..[92m[OKAY][0m 
compatible
--------------------------------------------------
fused_adamcpu_adam  ............................  [92m[YES][0m[93m[NO][0m ......  .......[92m[OKAY][0m 
[92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... fused_adam[92m[OKAY][0m 
............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m sparse_attn.......  ............[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m .......sparse_attn  [92m[OKAY][0m............ 
[93m[NO][0m ....... stochastic_transformer[92m[OKAY][0m 
.transformer  [93m[NO][0m............  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ...............async_io [93m[NO][0m  ......................  [93m[NO][0m[93m[NO][0m
 ....... [93m[NO][0m
transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utils ..................utils  [92m[YES][0m..................  ......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
quantizer quantizer..............  ..............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utils utils..................  ..................[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
quantizer ..............quantizer  [93m[NO][0m..............  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... [93m[NO][0masync_io .......  [93m[NO][0m...............
 [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0mtransformer_inference  .........  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
utils ..................utils  [92m[YES][0m..................  ......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
quantizer ..............quantizer  [93m[NO][0m..............  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... async_io[93m[NO][0m  ......................  [93m[NO][0m[93m[NO][0m
 ....... [93m[NO][0m
transformer_inference .. transformer_inference[93m[NO][0m  .........  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
utils utils..................  ..................[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
quantizer quantizer ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ...............async_io [93m[NO][0m  ......................  [93m[NO][0m[93m[NO][0m
 ....... [93m[NO][0m
transformer_inference ..transformer_inference  [93m[NO][0m..  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
utils ..................utils  [92m[YES][0m..................  ......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
DeepSpeed general environment info:
torch install path ...............torch install path  ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch version
 .................... torch version1.8.1 
.................... torch cuda version1.8.1 
............... torch cuda version11.1 
............... nvcc version11.1 
..................... nvcc version11.2 
.....................deepspeed install path  11.2...........
 deepspeed install path ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']........... 
deepspeed info ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']...................
 deepspeed info0.4.2+bc17042, bc17042, big-science 
...................deepspeed wheel compiled w. 0.4.2+bc17042, bc17042, big-science 
...... deepspeed wheel compiled w.torch 1.8, cuda 11.1 
...... torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.cpu_adam
 --------------------------------------------------...............
 JIT compiled ops requires ninja[92m[YES][0m
 ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch cuda version ............... 11.1
nvcc version ..................... 11.2
torch version .................... 1.8.1
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... DeepSpeed general environment info:['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info 
................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. torch install path......  torch 1.8, cuda 11.1...............
 ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:DeepSpeed general environment info:

op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
JIT compiled ops requires ninja
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
DeepSpeed general environment info:
torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 
.................... 1.8.1
torch version ....................torch cuda version  1.8.1...............
 11.1
torch cuda versionnvcc version  ....................................  11.111.2

nvcc versiondeepspeed install path  ................................  11.2
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed install path
 deepspeed info...........  ................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']0.4.2+bc17042, bc17042, big-science

deepspeed infodeepspeed wheel compiled w.  .........................  0.4.2+bc17042, bc17042, big-sciencetorch 1.8, cuda 11.1

deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info
 deepspeed install path...................  ........... 0.4.2+bc17042, bc17042, big-science
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed wheel compiled w.
 deepspeed info...... torch 1.8, cuda 11.1
 ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
ninja .................. [92m[OKAY][0m
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
--------------------------------------------------
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
op name ................ installed .. compatible
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
--------------------------------------------------
--------------------------------------------------
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
JIT compiled ops requires ninja
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
JIT compiled ops requires ninja
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utilsutils ..................  ..................[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
quantizer ..............quantizer  [93m[NO][0m..............  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m--------------------------------------------------

--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m--------------------------------------------------

fused_lamb DeepSpeed C++/CUDA extension op report.............
 [93m[NO][0m-------------------------------------------------- 
.......NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op. 
[92m[OKAY][0m
--------------------------------------------------
JIT compiled ops requires ninja
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
async_io ............... [93m[NO][0m ....... [93m[NO][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
DeepSpeed general environment info:torch cuda version 
............... 11.1
nvcc versiontorch install path .....................  ...............11.2 
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

deepspeed info ...................torch version  ....................0.4.2+bc17042, bc17042, big-science 
1.8.1deepspeed wheel compiled w.
 ......torch cuda version  torch 1.8, cuda 11.1...............
 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inference .. transformer_inference[93m[NO][0m  .........  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
utils .................. [92m[YES][0mutils  ........................  [92m[OKAY][0m[92m[YES][0m
 ...... [92m[OKAY][0m
quantizer .............. quantizer[93m[NO][0m ..............  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... [93m[NO][0m .......async_io  [93m[NO][0m...............
 [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0mtransformer_inference
 .. [93m[NO][0m .......utils  [92m[OKAY][0m..................
 [92m[YES][0m ...... [92m[OKAY][0m
utils .................. quantizer[92m[YES][0m  ....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
quantizer .............. --------------------------------------------------[93m[NO][0m
 ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report--------------------------------------------------

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
      meet the required dependencies to JIT install the op.

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io async_io............... [93m[NO][0m  ......................  [93m[NO][0m[93m[NO][0m 
....... [93m[NO][0m
transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
--------------------------------------------------
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninjaninja  .................................... [92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------op name
 op name................  installed................  .. installedcompatible 
..-------------------------------------------------- 
compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m cpu_adam......  [92m[OKAY][0m...............
 [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. fused_lamb[93m[NO][0m  .................... [93m[NO][0m  [92m[OKAY][0m....... 
[92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0msparse_attn
 transformer............  ............[93m[NO][0m [93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

transformerstochastic_transformer  ............ .[93m[NO][0m  [93m[NO][0m.......  ....... [92m[OKAY][0m[92m[OKAY][0m

stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
DeepSpeed C++/CUDA extension op report
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
--------------------------------------------------
JIT compiled ops requires ninja
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
JIT compiled ops requires ninja
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
JIT compiled ops requires ninja
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
op name ................ installed .. compatible
--------------------------------------------------
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
ninja .................. [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
op name ................ installed .. compatible
--------------------------------------------------
async_io ............... [93m[NO][0m ....... [93m[NO][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
ninja .................. [92m[OKAY][0m
cpu_adam ...............-------------------------------------------------- 
[92m[YES][0mop name  ......................  [92m[OKAY][0minstalled
 .. compatible
--------------------------------------------------
fused_adam ............. cpu_adam[93m[NO][0m ...............  .......[92m[YES][0m  [92m[OKAY][0m...... 
[92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attnfused_lamb  .........................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m
[92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ stochastic_transformer[93m[NO][0m  ....... .[92m[OKAY][0m
 [93m[NO][0m ....... [92m[OKAY][0mtransformer
 ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path ...............  ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path ...........  ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
DeepSpeed general environment info:
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
torch version .................... 1.8.1
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
torch version .................... 1.8.1
--------------------------------------------------
JIT compiled ops requires ninja
JIT compiled ops requires ninja
torch cuda version ............... 11.1
nvcc version ..................... 11.2
ninja .................. [92m[OKAY][0m
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
--------------------------------------------------
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
op name ................ installed .. compatible
--------------------------------------------------
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
op name ................ installed .. compatible
--------------------------------------------------
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
DeepSpeed general environment info:
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
torch install path DeepSpeed general environment info:............... 
DeepSpeed general environment info:
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch install path
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
 torch install path...............torch version   ...................................  1.8.1
torch cuda version['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ...............
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
 ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']11.1
torch version
 nvcc versiontorch version....................   .........................................1.8.1  
11.21.8.1

torch cuda versiondeepspeed install path  torch cuda version..........................   ...............11.1 
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']11.1nvcc version

fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
 deepspeed infonvcc version.....................   ........................................11.2  
0.4.2+bc17042, bc17042, big-science11.2deepspeed install path

 deepspeed wheel compiled w.deepspeed install path...........   ................. ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']torch 1.8, cuda 11.1 

['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info
 ...................deepspeed info  0.4.2+bc17042, bc17042, big-science...................
 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w.
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
 ......deepspeed wheel compiled w.  torch 1.8, cuda 11.1......
 torch 1.8, cuda 11.1
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed info deepspeed info...................  ...................0.4.2+bc17042, bc17042, big-science 
0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. deepspeed wheel compiled w.......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path ...........  ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info deepspeed info...................  ...................0.4.2+bc17042, bc17042, big-science 
0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w.
 deepspeed wheel compiled w.......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
----------------------------------------------------------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

DeepSpeed C++/CUDA extension op report--------------------------------------------------

JIT compiled ops requires ninja--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:DeepSpeed general environment info:

ninja .................. [92m[OKAY][0m
--------------------------------------------------
torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

op name ................ installed .. compatible
torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

--------------------------------------------------
deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0mninja
 .................. fused_lamb[92m[OKAY][0m 
............. --------------------------------------------------[93m[NO][0m
 .......op name  [92m[OKAY][0m................
 installed .. compatible
--------------------------------------------------
sparse_attn ............cpu_adam  [93m[NO][0m...............  .......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0mtransformer
 ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformerfused_adam  ..............  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
/bin/sh: line 0: type: git: not found
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

torch version .................... 1.8.1
transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

torch cuda version ............... 11.1
nvcc version ..................... 11.2
utilsutils  ....................................  [92m[YES][0m [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
----------------------------------------------------------------------------------------------------

ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
ninja .................. [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
async_io ............... [93m[NO][0m ....... [93m[NO][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninjaninja  ....................................  [92m[OKAY][0m[92m[OKAY][0m

ninja----------------------------------------------------------------------------------------------------
 
..................op nameop name  [92m[OKAY][0m................ 
 ................installed--------------------------------------------------  
installed..op name   compatible.................. 
 compatibleinstalled-------------------------------------------------- 

.. --------------------------------------------------compatible

--------------------------------------------------
cpu_adam ............... [92m[YES][0mcpu_adam  ......cpu_adam...............   [92m[OKAY][0m...............[92m[YES][0m 
 ......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... fused_adam[92m[OKAY][0mfused_adam 
 ..........................  [93m[NO][0mfused_lamb[93m[NO][0m   ...........................   [92m[OKAY][0m[92m[OKAY][0m[93m[NO][0m

 .......fused_lamb fused_lamb [92m[OKAY][0m .............
.............  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

sparse_attn ............ [93m[NO][0m .......ninja [92m[OKAY][0m 
sparse_attn..................sparse_attn transformer  ............ [92m[OKAY][0m............ ............ 
[93m[NO][0m [93m[NO][0m  --------------------------------------------------[93m[NO][0m..............
   op name[92m[OKAY][0m.......[92m[OKAY][0m
  
................[92m[OKAY][0m transformerinstalledtransformer
  .. ............ ............ stochastic_transformer[93m[NO][0mcompatible 
 [93m[NO][0m--------------------------------------------------.......   
[92m[OKAY][0m........
  [93m[NO][0m[92m[OKAY][0m 
.......cpu_adamstochastic_transformer   stochastic_transformer...............[92m[OKAY][0m  .
 [92m[YES][0m.[93m[NO][0m ......   [93m[NO][0m.......[92m[OKAY][0m  .......
[92m[OKAY][0m 
[92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
--------------------------------------------------
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
async_io ...............utils  [93m[NO][0m..................  .......[92m[YES][0m  [93m[NO][0m......
 [92m[OKAY][0m
quantizer .............. [93m[NO][0m transformer_inference.......  ..[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m--------------------------------------------------

utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
--------------------------------------------------
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
async_io ............... [93m[NO][0m async_io....... [93m[NO][0m 
--------------------------------------------------
............... [93m[NO][0m ....... [93m[NO][0mtransformer_inference
 .. [93m[NO][0m ....... [92m[OKAY][0m
utilstransformer_inference  ....................  [92m[YES][0m[93m[NO][0m  .............  [92m[OKAY][0m[92m[OKAY][0m

quantizer .............. utils[93m[NO][0m  .........................  [92m[YES][0m[92m[OKAY][0m 
...... [92m[OKAY][0m
--------------------------------------------------
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
DeepSpeed general environment info:torch install path
 ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']1.8.1

torch cuda versiontorch version  ...................................  11.11.8.1

nvcc version torch cuda version.....................  ...............11.2 
11.1deepspeed install path
 nvcc version...........  ..................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
11.2deepspeed info
 deepspeed install path...................  ...........0.4.2+bc17042, bc17042, big-science 
deepspeed wheel compiled w.['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
...... deepspeed infotorch 1.8, cuda 11.1 
................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

DeepSpeed general environment info:
async_io ............... [93m[NO][0m ....... [93m[NO][0masync_io
DeepSpeed general environment info:torch install path 
 ............... [93m[NO][0m ....... [93m[NO][0m
............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version ....................['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 
1.8.1
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
torch versiontorch cuda version  ...................................  1.8.111.1

nvcc version torch cuda version.....................  ............... 11.211.1

transformer_inference ..utils  [93m[NO][0m..................  .......[92m[YES][0m  [92m[OKAY][0m...... 
[92m[OKAY][0m
deepspeed install pathnvcc version  ................................  11.2['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed install pathdeepspeed info  ..............................  0.4.2+bc17042, bc17042, big-science['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
quantizer utils..............  ..................[93m[NO][0m  [92m[YES][0m.......  [92m[OKAY][0m......

deepspeed wheel compiled w.deepspeed info  .........................  torch 1.8, cuda 11.10.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
 [92m[OKAY][0m
--------------------------------------------------
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.async_io
 ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inferenceasync_io ..  ...............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[93m[NO][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
transformer_inference .. quantizer[93m[NO][0m  .....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
--------------------------------------------------utils 
.................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... async_io[93m[NO][0m ....... ............... [93m[NO][0m ....... [93m[NO][0m
 [93m[NO][0m
transformer_inferencetransformer_inference  ....  [93m[NO][0m ....... [92m[OKAY][0m
[93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
utils .................. quantizer[92m[YES][0m  ....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
quantizer ..............-------------------------------------------------- 
[93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... [93m[NO][0m ....... async_io[93m[NO][0m
 ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference ..transformer_inference  [93m[NO][0m..  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
utils ..................utils  [92m[YES][0m..................  ......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
quantizer .............. quantizer[93m[NO][0m  .....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m--------------------------------------------------

--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
--------------------------------------------------
torch cuda version ............... 11.1
nvcc version ..................... 11.2
DeepSpeed C++/CUDA extension op report
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
/bin/sh: line 0: type: git: not found
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ...............async_io [93m[NO][0m  ......................  [93m[NO][0m[93m[NO][0m
 ....... [93m[NO][0m
transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utils utils..................  ..................[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m .......  .......[92m[OKAY][0m 
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[92m[OKAY][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
JIT compiled ops requires ninja
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0mninja
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
 transformer.................. ............  [93m[NO][0m[92m[OKAY][0m .......
 [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
op name stochastic_transformer................  installed.  ..[93m[NO][0m  .......compatible 
[92m[OKAY][0m
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io async_io...............  [93m[NO][0m ......................  [93m[NO][0m[93m[NO][0m 
....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... transformer_inference[92m[OKAY][0m 
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

.. [93m[NO][0m ....... utils[92m[OKAY][0m 
async_io ...............async_io [93m[NO][0m  ......................  [93m[NO][0m[93m[NO][0m
.................. [92m[YES][0m ...... [92m[OKAY][0m
utils ..................quantizer  [92m[YES][0m..............  ......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
 ....... [93m[NO][0m
quantizer ..............-------------------------------------------------- 
[93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
transformer_inference .. [93m[NO][0m transformer_inference.......  ..[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... utils[92m[OKAY][0m 
.................. [92m[YES][0mquantizer  ....................  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
quantizer .............. --------------------------------------------------[93m[NO][0m
 ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
fused_adam-------------------------------------------------- 
............. op name[93m[NO][0m ................  .......ninjainstalled   ..[92m[OKAY][0m.................. 
 compatible[92m[OKAY][0m
fused_lamb
-------------------------------------------------- 
.............-------------------------------------------------- 
[93m[NO][0m op name.......  ................[92m[OKAY][0m 
cpu_adaminstalled  .................  [92m[YES][0mcompatible 
...... --------------------------------------------------[92m[OKAY][0m

sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformercpu_adam fused_adam ............ ............... ............. [93m[NO][0m [92m[YES][0m [93m[NO][0m ....... ...... ....... [92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m

fused_lambstochastic_transformer .............  [93m[NO][0m ........  [93m[NO][0mfused_adam[92m[OKAY][0m  
....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
DeepSpeed general environment info:
fused_lamb ............. sparse_attn[93m[NO][0m  ............ .......[93m[NO][0m  .......[92m[OKAY][0m 
[92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
sparse_attnstochastic_transformer  ............ .[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m [92m[OKAY][0m

torch version .................... 1.8.1
torch cuda version ............... 11.1
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
DeepSpeed general environment info:
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
DeepSpeed general environment info:DeepSpeed general environment info:

torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
--------------------------------------------------
torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version .................... 1.8.1 
.................... 1.8.1torch cuda version
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
 ............... torch cuda version11.1 
............... nvcc version11.1 
torch version .................... 1.8.1
torch cuda version ............... 11.1
..................... nvcc version11.2 
..................... deepspeed install path11.2 
nvcc version ..................... 11.2
........... deepspeed install path ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']........... 
deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ................... 0.4.2+bc17042, bc17042, big-science

deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed infodeepspeed wheel compiled w.  .........................  torch 1.8, cuda 11.10.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m .......transformer_inference  [93m[NO][0m..
 [93m[NO][0m ....... [92m[OKAY][0m
transformer_inferenceutils  ....................  [92m[YES][0m[93m[NO][0m  .............  [92m[OKAY][0m[92m[OKAY][0m

quantizer ..............utils  [93m[NO][0m..................  .......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
--------------------------------------------------
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
torch version .................... 1.8.1
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
torch cuda version ............... 11.1
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
DeepSpeed general environment info:
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... DeepSpeed general environment info:['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch version .................... 1.8.1torch install path
 torch cuda version ..............................  11.1
nvcc version ..................... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']11.2

deepspeed install path torch version...........  .................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']1.8.1

deepspeed info torch cuda version...................  ...............0.4.2+bc17042, bc17042, big-science 
11.1deepspeed wheel compiled w.
 nvcc version......  .....................torch 1.8, cuda 11.1 11.2

deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
DeepSpeed general environment info:
async_io ............... [93m[NO][0m ....... [93m[NO][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path ...............  ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version torch version....................  ....................1.8.1 
1.8.1
torch cuda version torch cuda version...............  ...............11.1 
11.1nvcc version
 nvcc version.....................  .....................11.2 
11.2deepspeed install path
 deepspeed install path...........  ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info
 deepspeed info...................  ...................0.4.2+bc17042, bc17042, big-science 
0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w.
 deepspeed wheel compiled w.......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
DeepSpeed general environment info:
DeepSpeed general environment info:torch install path
 ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 
.................... 1.8.1torch version
 ....................torch cuda version  1.8.1...............
 11.1torch cuda version
 nvcc version...............  .....................11.1 
11.2
nvcc version deepspeed install path.....................  ...........11.2 
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed install path
 ...........deepspeed info  ...................['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
0.4.2+bc17042, bc17042, big-sciencedeepspeed info
 deepspeed wheel compiled w....................  ......0.4.2+bc17042, bc17042, big-science 
torch 1.8, cuda 11.1deepspeed wheel compiled w.
 ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
JIT compiled ops requires ninja
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
ninja .................. [92m[OKAY][0m
--------------------------------------------------
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
DeepSpeed general environment info:
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
--------------------------------------------------
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
DeepSpeed general environment info:torch version 
op name ................ installed .. compatible
torch version .................... 1.8.1
.................... 1.8.1
--------------------------------------------------
torch cuda version ............... 11.1
nvcc version ..................... 11.2
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
torch cuda versiontorch install path ...............  ...............11.1 
nvcc version ..................... 11.2
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']deepspeed install path
 ........... torch version['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
.................... 1.8.1
torch cuda version ............... 11.1deepspeed info
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
 nvcc version...................  ..................... 11.2
deepspeed install path 0.4.2+bc17042, bc17042, big-science...........
 deepspeed wheel compiled w.['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science......
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
 deepspeed wheel compiled w.torch 1.8, cuda 11.1 
...... torch 1.8, cuda 11.1
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
ninjacpu_adam  .................................  [92m[OKAY][0m[92m[YES][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
 ......-------------------------------------------------- 
[92m[OKAY][0mop name
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
 ................ installed .. compatible
torch cuda version ............... 11.1
--------------------------------------------------
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
nvcc version ..................... 11.2
cpu_adam fused_lamb...............  [92m[YES][0m............. ......  [93m[NO][0m[92m[OKAY][0m 
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
....... [92m[OKAY][0m
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
fused_adam ............. sparse_attn[93m[NO][0m  ...................  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0mfused_lamb
 ............. [93m[NO][0mtransformer  ....... ............[92m[OKAY][0m 
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m[NO][0m ....... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
stochastic_transformer . [93m[NO][0m sparse_attn.......  ............[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
async_io ............... [93m[NO][0m ....... [93m[NO][0m
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... DeepSpeed general environment info:['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch version .................... 1.8.1
torch install pathtorch cuda version  ..............................  11.1
nvcc version ..................... 11.2['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

deepspeed install path ...........torch version  ....................['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
1.8.1deepspeed info
 ...................torch cuda version  0.4.2+bc17042, bc17042, big-science...............
 deepspeed wheel compiled w.11.1 
......nvcc version  torch 1.8, cuda 11.1.....................
 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

JIT compiled ops requires ninja
async_io ............... [93m[NO][0masync_io .......  ...............[93m[NO][0m 
[93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference utils..  [93m[NO][0m ....... [92m[OKAY][0m
.................. [92m[YES][0m ...... utils[92m[OKAY][0m 
.................. [92m[YES][0m ......quantizer  [92m[OKAY][0m..............
 [93m[NO][0m ....... [92m[OKAY][0mquantizer
 .............. --------------------------------------------------[93m[NO][0m
 ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
ninja .................. [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io async_io............... [93m[NO][0m  ......................  [93m[NO][0m[93m[NO][0m
 ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0mutils  .........................  [92m[OKAY][0m[92m[YES][0m
 ...... [92m[OKAY][0m
utils ..................quantizer  [92m[YES][0m..............  ......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
quantizer-------------------------------------------------- 
.............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

--------------------------------------------------
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0mninja ......  ..................[92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------
op name ................ installed ..fused_adam  compatible.............
 --------------------------------------------------[93m[NO][0m
 ....... [92m[OKAY][0m
cpu_adamfused_lamb  ............... .............[92m[YES][0m  [93m[NO][0m......  .......[92m[OKAY][0m
 [92m[OKAY][0m
fused_adamsparse_attn  .........................  [93m[NO][0m [93m[NO][0m.......  .......[92m[OKAY][0m
 [92m[OKAY][0m
fused_lambtransformer  ......................... [93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m
 [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... sparse_attn ............ [92m[OKAY][0m[93m[NO][0m 
....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... async_io[93m[NO][0m .......  ...............[93m[NO][0m 
[93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0mtransformer_inference  .........  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
utils .................. [92m[YES][0m utils......  ..................[92m[OKAY][0m 
[92m[YES][0m ...... [92m[OKAY][0mquantizer
 .............. [93m[NO][0m .......quantizer  [92m[OKAY][0m..............
 [93m[NO][0m .......-------------------------------------------------- 
[92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda version torch cuda version...............  ...............11.1 
11.1nvcc version
 nvcc version.....................  .....................11.2 
11.2deepspeed install path
 deepspeed install path...........  ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inference .. [93m[NO][0m transformer_inference.......  ..[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m utils......  ..................[92m[OKAY][0m 
[92m[YES][0m ......quantizer  [92m[OKAY][0m..............
 [93m[NO][0m .......quantizer  [92m[OKAY][0m..............
 [93m[NO][0m .......-------------------------------------------------- 
[92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

ninja .................. [92m[OKAY][0m
ninja-------------------------------------------------- 
.................. op name[92m[OKAY][0m 
................ installed-------------------------------------------------- 
.. op namecompatible 
................-------------------------------------------------- 
installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m .......fused_adam  [92m[OKAY][0m.............
 [93m[NO][0m ....... fused_lamb[92m[OKAY][0m 
............. [93m[NO][0m fused_lamb.......  .............[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m sparse_attn.......  ............[92m[OKAY][0m 
[93m[NO][0m .......transformer  [92m[OKAY][0m............
 [93m[NO][0m ....... transformer[92m[OKAY][0m 
............ [93m[NO][0m ....... [92m[OKAY][0mstochastic_transformer
 .stochastic_transformer  [93m[NO][0m ........  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inference ..transformer_inference  [93m[NO][0m..  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
utils utils..................  ..................[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path ...............  ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

/bin/sh: line 0: type: git: not found
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

async_io ............... [93m[NO][0m ....... [93m[NO][0m
nvcc versionnvcc version  ..........................................  11.211.2
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m

utils .................. [92m[YES][0m ...... [92m[OKAY][0m
deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
using world size: 256, data-parallel-size: 8, tensor-model-parallel size: 4, pipeline-model-parallel size: 8 
using torch.float16 for parameters ...
------------------------ arguments ------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
  accumulate_allreduce_grads_in_fp32 .............. False
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
  adam_beta1 ...................................... 0.9
  adam_beta2 ...................................... 0.999
  adam_eps ........................................ 1e-08
  adlr_autoresume ................................. False
  adlr_autoresume_interval ........................ 1000
  apply_query_key_layer_scaling ................... True
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
  apply_residual_connection_post_layernorm ........ False
  attention_dropout ............................... 0.1
  attention_softmax_in_fp32 ....................... False
  bert_binary_head ................................ True
  bert_load ....................................... None
  bf16 ............................................ False
  bias_dropout_fusion ............................. True
  bias_gelu_fusion ................................ True
  biencoder_projection_dim ........................ 0
  biencoder_shared_query_context_model ............ False
  block_data_path ................................. None
  checkpoint_activations .......................... True
  checkpoint_in_cpu ............................... False
  checkpoint_num_layers ........................... 1
  clip_grad ....................................... 1.0
  codecarbon_dir .................................. /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/tr8-104B-logs/codecarbon
  consumed_train_samples .......................... 0
  consumed_valid_samples .......................... 0
  contigious_checkpointing ........................ False
  cpu_optimizer ................................... False
  cpu_torch_adam .................................. False
  data_impl ....................................... mmap
  data_parallel_size .............................. 8
  data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document']
  dataloader_type ................................. single
  DDP_impl ........................................ local
  decoder_seq_length .............................. None
  deepscale ....................................... False
  deepscale_config ................................ None
  deepspeed ....................................... True
  deepspeed_activation_checkpointing .............. True
  deepspeed_config ................................ ./ds_config.1164492.json
  deepspeed_mpi ................................... False
  distribute_checkpointed_activations ............. False
  distributed_backend ............................. nccl
  embedding_path .................................. None
  encoder_seq_length .............................. 2048
  eod_mask_loss ................................... False
  eval_interval ................................... 1000
  eval_iters ...................................... 5
  evidence_data_path .............................. None
  exit_duration_in_mins ........................... 1190
  exit_interval ................................... None
  ffn_hidden_size ................................. 20480
  finetune ........................................ False
  fp16 ............................................ True
  fp16_lm_cross_entropy ........................... False
  fp32_residual_connection ........................ False
  global_batch_size ............................... 2048
  hidden_dropout .................................. 0.1
  hidden_size ..................................... 16384
  hysteresis ...................................... 2
  ict_head_size ................................... None
  ict_load ........................................ None
  img_dim ......................................... 224
  indexer_batch_size .............................. 128
  indexer_log_interval ............................ 1000
  init_method_std ................................. 0.02
  init_method_xavier_uniform ...................... False
  initial_loss_scale .............................. 4294967296
  kv_channels ..................................... 512
  layernorm_epsilon ............................... 1e-05
  lazy_mpu_init ................................... None
  load ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints
  local_rank ...................................... 0
  log_batch_size_to_tensorboard ................... True
  log_interval .................................... 1
  log_learning_rate_to_tensorboard ................ True
  log_loss_scale_to_tensorboard ................... True
  log_num_zeros_in_grad ........................... False
  log_params_norm ................................. False
  log_timers_to_tensorboard ....................... True
  log_validation_ppl_to_tensorboard ............... True
  loss_scale ...................................... 12.0
  loss_scale_window ............................... 1000
  lr .............................................. 6e-05
  lr_decay_iters .................................. None
  lr_decay_samples ................................ 126953125
  lr_decay_style .................................. cosine
  lr_warmup_fraction .............................. None
  lr_warmup_iters ................................. 0
  lr_warmup_samples ............................... 216320
  make_vocab_size_divisible_by .................... 128
  mask_prob ....................................... 0.15
  masked_softmax_fusion ........................... True
  max_position_embeddings ......................... 2048
  memory_centric_tiled_linear ..................... False
  merge_file ...................................... /gpfswork/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/data/gpt2-merges.txt
  micro_batch_size ................................ 1
  min_loss_scale .................................. 1.0
  min_lr .......................................... 6e-06
  mmap_warmup ..................................... False
  no_load_optim ................................... None
  no_load_rng ..................................... None
  no_save_optim ................................... None
  no_save_rng ..................................... None
  num_attention_heads ............................. 32
  num_channels .................................... 3
  num_classes ..................................... 1000
  num_layers ...................................... 32
  num_layers_per_virtual_pipeline_stage ........... None
  num_workers ..................................... 2
  onnx_safe ....................................... None
  openai_gelu ..................................... False
  optimizer ....................................... adam
  override_lr_scheduler ........................... False
  params_dtype .................................... torch.float16
  partition_activations ........................... False
  patch_dim ....................................... 16
  pipeline_model_parallel_size .................... 8
  position_embedding_type ......................... PositionEmbeddingType.absolute
  profile_backward ................................ False
  query_in_block_prob ............................. 0.1
  rampup_batch_size ............................... ['16', '16', '6_000_000']
  rank ............................................ 0
  remote_device ................................... none
  reset_attention_mask ............................ False
  reset_position_ids .............................. False
  retriever_report_topk_accuracies ................ []
  retriever_score_scaling ......................... False
  retriever_seq_length ............................ 256
  sample_rate ..................................... 1.0
  save ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints
  save_interval ................................... 1500
  scatter_gather_tensors_in_pipeline .............. True
  scattered_embeddings ............................ False
  seed ............................................ 42
  seq_length ...................................... 2048
  sgd_momentum .................................... 0.9
  short_seq_prob .................................. 0.1
  split ........................................... 949,50,1
  split_transformers .............................. False
  synchronize_each_layer .......................... False
  tensor_model_parallel_size ...................... 4
  tensorboard_dir ................................. /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/tr8-104B-logs/tensorboard
  tensorboard_log_interval ........................ 1
  tensorboard_queue_size .......................... 5
  tile_factor ..................................... 1
  titles_data_path ................................ None
  tokenizer_name_or_path .......................... None
  tokenizer_type .................................. GPT2BPETokenizer
  train_iters ..................................... None
  train_samples ................................... 300000000
  use_checkpoint_lr_scheduler ..................... False
  use_contiguous_buffers_in_ddp ................... False
  use_cpu_initialization .......................... None
  use_one_sent_docs ............................... False
  use_pin_memory .................................. False
  virtual_pipeline_model_parallel_size ............ None
  vocab_extra_ids ................................. 0
  vocab_file ...................................... /gpfswork/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/data/gpt2-vocab.json
  weight_decay .................................... 0.1
  world_size ...................................... 256
  zero_allgather_bucket_size ...................... 0.0
  zero_contigious_gradients ....................... False
  zero_reduce_bucket_size ......................... 0.0
  zero_reduce_scatter ............................. False
  zero_stage ...................................... 1
-------------------- end of arguments ---------------------
will use batch size rampup starting from global batch size 16 to global batch size 2048 with batch size increments 16 over 6000000 samples.
> building GPT2BPETokenizer tokenizer ...
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
--------------------------------------------------
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
DeepSpeed general environment info:DeepSpeed general environment info:

async_io ............... [93m[NO][0m ....... [93m[NO][0m
torch install pathtorch install path  ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
torch versiontorch version  ........................................  1.8.11.8.1

utils .................. [92m[YES][0m ...... [92m[OKAY][0m
torch cuda versiontorch cuda version  ..............................  11.111.1

quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

--------------------------------------------------
deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... [93m[NO][0m async_io.......  [93m[NO][0m...............
 [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m .......transformer_inference  [92m[OKAY][0m..
/bin/sh: line 0: type: git: not found
 [93m[NO][0m ....... [92m[OKAY][0mutils
 .................. [92m[YES][0m ......utils  [92m[OKAY][0m..................
 [92m[YES][0m ...... quantizer[92m[OKAY][0m 
.............. [93m[NO][0m quantizer.......  ..............[92m[OKAY][0m 
[93m[NO][0m ....... --------------------------------------------------[92m[OKAY][0m

--------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
> setting tensorboard ...
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
async_io ............... [93m[NO][0m ....... [93m[NO][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io async_io...............  [93m[NO][0m...............  .......[93m[NO][0m  [93m[NO][0m.......
 [93m[NO][0m
transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utils utils..................  ..................[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... DeepSpeed general environment info:
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch install path torch version...............  .................... 1.8.1
torch cuda version ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']............... 
11.1
nvcc versiontorch version  .........................................  11.21.8.1

deepspeed install path torch cuda version...........  ............... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']11.1

deepspeed infonvcc version  ........................................  0.4.2+bc17042, bc17042, big-science11.2

deepspeed wheel compiled w.deepspeed install path  .................  torch 1.8, cuda 11.1['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0mutils  .........................  [92m[OKAY][0m[92m[YES][0m
 ...... [92m[OKAY][0m
quantizerutils  ................................  [93m[NO][0m[92m[YES][0m  .............  [92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path ...............  ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version ....................torch version  1.8.1....................
 1.8.1
torch cuda version ...............torch cuda version  11.1...............
nvcc version  11.1.....................
 nvcc version11.2 
.....................deepspeed install path  11.2...........
 deepspeed install path ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']...........
 deepspeed info ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']...................
 deepspeed info0.4.2+bc17042, bc17042, big-science 
...................deepspeed wheel compiled w.  ...... 0.4.2+bc17042, bc17042, big-sciencetorch 1.8, cuda 11.1

deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
DeepSpeed general environment info:DeepSpeed general environment info:

torch version .................... 1.8.1
torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda version ............... 11.1
torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

DeepSpeed general environment info:
deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
DeepSpeed general environment info:
torch cuda version ............... 11.1
nvcc version ..................... 11.2
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
torch version .................... 1.8.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
----------------------------------------------------------------------------------------------------
DeepSpeed C++/CUDA extension op report

--------------------------------------------------DeepSpeed C++/CUDA extension op report

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja--------------------------------------------------

JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
DeepSpeed general environment info:
torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 
.................... 1.8.1
torch version ....................torch cuda version  1.8.1...............
 11.1
torch cuda versionnvcc version  ....................................  11.111.2

nvcc versiondeepspeed install path  ................................  11.2
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed install path
 deepspeed info...........  ................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']0.4.2+bc17042, bc17042, big-science

deepspeed infodeepspeed wheel compiled w.  .........................  0.4.2+bc17042, bc17042, big-sciencetorch 1.8, cuda 11.1

deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
ninja ninja..................  [92m[OKAY][0m..................
 [92m[OKAY][0m--------------------------------------------------

--------------------------------------------------op name
 ................op name  installed................  .. installedcompatible 
.. --------------------------------------------------compatible

--------------------------------------------------
cpu_adam ...............cpu_adam [92m[YES][0m  .....................  [92m[YES][0m[92m[OKAY][0m 
...... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
fused_adam-------------------------------------------------- 
fused_adam.............op name   [93m[NO][0m.............................   installed.......[93m[NO][0m   [92m[OKAY][0m.........
  compatible[92m[OKAY][0m
fused_lamb
 --------------------------------------------------.............
 [93m[NO][0mfused_lamb  ....................  [92m[OKAY][0m[93m[NO][0m
 cpu_adam.......ninja  ...............  [92m[OKAY][0m..................[92m[YES][0m
  [92m[OKAY][0m......sparse_attn [92m[OKAY][0m 

............ --------------------------------------------------[93m[NO][0m
 .......op name  sparse_attn[92m[OKAY][0m................ fused_adam 
 installed.........................   ..transformer[93m[NO][0m[93m[NO][0m    ..........................   [92m[OKAY][0m[93m[NO][0m[92m[OKAY][0mcompatible
 

.......transformer  --------------------------------------------------[92m[OKAY][0mfused_lamb............
  .............[93m[NO][0m
  stochastic_transformer[93m[NO][0m.......   .......[92m[OKAY][0m .
[92m[OKAY][0m 
[93m[NO][0mcpu_adamstochastic_transformer   ...................... . [92m[OKAY][0m sparse_attn[92m[YES][0m
 [93m[NO][0m ..................   [93m[NO][0m[92m[OKAY][0m.......  [92m[OKAY][0m

....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
fused_adamstochastic_transformer  ..............  [93m[NO][0m[93m[NO][0m  .............. [92m[OKAY][0m
 [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
/bin/sh: line 0: type: git: not found
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
/bin/sh: line 0: type: git: not found
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ...............DeepSpeed general environment info: 11.1
nvcc version
 ..................... 11.2
deepspeed install pathtorch install path ...........  ............... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']deepspeed wheel compiled w.
 ...... torch versiontorch 1.8, cuda 11.1 
.................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inference .. transformer_inference[93m[NO][0m  .........  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
utils ..................utils  [92m[YES][0m..................  ......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
quantizer .............. quantizer[93m[NO][0m  .....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
DeepSpeed general environment info:
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninja ..................ninja [92m[OKAY][0m 
..................-------------------------------------------------- 
[92m[OKAY][0mop name
 ................-------------------------------------------------- installed
 ..op name compatible 
................-------------------------------------------------- 
installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... cpu_adam[92m[OKAY][0m
 ............... [92m[YES][0m ...... [92m[OKAY][0m
DeepSpeed general environment info:
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0mfused_adam
 ............. [93m[NO][0mfused_lamb  ....................  [93m[NO][0m [92m[OKAY][0m.......
 [92m[OKAY][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
torch version .................... 1.8.1
torch cuda version ............... 11.1
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
nvcc version ..................... 11.2
sparse_attn ............transformer  [93m[NO][0m............  .......[93m[NO][0m  .......[92m[OKAY][0m 
[92m[OKAY][0m
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
transformer ............stochastic_transformer  [93m[NO][0m ........  [93m[NO][0m[92m[OKAY][0m .......
 [92m[OKAY][0m
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path ...............  ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version ....................torch version  1.8.1....................
 1.8.1torch cuda version
 ...............torch cuda version  11.1...............
 nvcc version11.1 
..................... nvcc version11.2 .....................
 deepspeed install path11.2 
...........deepspeed install path  ...........['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
...................deepspeed info  0.4.2+bc17042, bc17042, big-science...................
 deepspeed wheel compiled w.0.4.2+bc17042, bc17042, big-science 
......deepspeed wheel compiled w.  torch 1.8, cuda 11.1......
 torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io ...............  [93m[NO][0m...............  .......[93m[NO][0m  [93m[NO][0m.......
 [93m[NO][0m
transformer_inference ..transformer_inference  [93m[NO][0m..  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
utils ..................utils  [92m[YES][0m..................  ......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
quantizer .............. quantizer[93m[NO][0m  .....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
----------------------------------------------------------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
DeepSpeed C++/CUDA extension op report--------------------------------------------------

JIT compiled ops requires ninja
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inference transformer_inference..  ..[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
utils ..................utils  [92m[YES][0m..................  ......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
quantizer ..............quantizer  [93m[NO][0m..............  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
ninja .................. [92m[OKAY][0m
ninja--------------------------------------------------
 ..................op name  [92m[OKAY][0m................
 --------------------------------------------------installed
 ..op name  compatible................
 installed-------------------------------------------------- 
.. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m cpu_adam......  ...............[92m[OKAY][0m 
[92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0mfused_adam  ....................  [92m[OKAY][0m[93m[NO][0m
 ....... fused_lamb[92m[OKAY][0m 
............. [93m[NO][0mfused_lamb  ....................  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0msparse_attn  ...................  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0mtransformer
 ............ transformer[93m[NO][0m  ...................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
stochastic_transformer stochastic_transformer . [93m[NO][0m.  [93m[NO][0m.......  ....... [92m[OKAY][0m[92m[OKAY][0m

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
ninjatransformer  ..............................  [92m[OKAY][0m[93m[NO][0m
 --------------------------------------------------.......
 [92m[OKAY][0m
op name ................ installed stochastic_transformer..  compatible.
 --------------------------------------------------[93m[NO][0m
 ....... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path ...............  ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version torch version....................  ....................1.8.1 
1.8.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
torch cuda version torch cuda version...............  ...............11.1 
11.1nvcc version
 nvcc version.....................  .....................11.2 
11.2deepspeed install path
 deepspeed install path...........  ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info deepspeed info...................  ...................0.4.2+bc17042, bc17042, big-science 
deepspeed wheel compiled w.0.4.2+bc17042, bc17042, big-science 
......deepspeed wheel compiled w.  torch 1.8, cuda 11.1......
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
 torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

DeepSpeed general environment info:
DeepSpeed general environment info:torch install path ...............
 torch install path ...............['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 
torch version .................... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']1.8.1

torch cuda versiontorch version  ...................................  11.11.8.1

nvcc version .....................torch cuda version  11.2...............
 deepspeed install path11.1 
...........nvcc version  .....................['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
11.2deepspeed info
 ...................deepspeed install path  0.4.2+bc17042, bc17042, big-science...........
 deepspeed wheel compiled w. ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']......
 deepspeed infotorch 1.8, cuda 11.1 
................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w.DeepSpeed general environment info: ...... torch 1.8, cuda 11.1

torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
async_io ............... [93m[NO][0m ....... [93m[NO][0m
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m .......utils  [92m[OKAY][0m..................
 [92m[YES][0m ...... [92m[OKAY][0m
utils .................. [92m[YES][0mquantizer  ....................  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
quantizer ..............-------------------------------------------------- 
[93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path ...............  ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version ....................torch version  1.8.1....................
 1.8.1torch cuda version
 ...............torch cuda version  11.1...............
 11.1nvcc version
 nvcc version.....................  .....................11.2 
11.2deepspeed install path
 deepspeed install path...........  ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w. ......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
DeepSpeed general environment info:
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
torch install path ............... DeepSpeed general environment info:['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch version ....................torch install path 1.8.1
 ...............torch cuda version  ............... 11.1
nvcc version ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'].....................
 11.2
torch versiondeepspeed install path  ...............................  1.8.1
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
torch cuda versiondeepspeed info  ..................................  11.10.4.2+bc17042, bc17042, big-science

nvcc versiondeepspeed wheel compiled w.  ...........................  11.2torch 1.8, cuda 11.1

deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

 > padded vocab (size: 50257) with 431 dummy tokens (new size: 50688)
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
async_io ............... [93m[NO][0m ....... [93m[NO][0m
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... DeepSpeed general environment info:1.8.1

torch cuda version ............... 11.1
torch install pathnvcc version .....................  ...............11.2 
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']deepspeed info
 ................... torch version0.4.2+bc17042, bc17042, big-science 
....................deepspeed wheel compiled w.  1.8.1......
 torch 1.8, cuda 11.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
/bin/sh: line 0: type: git: not found
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:DeepSpeed general environment info:

**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
> setting codecarbon ...
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... ninja[92m[YES][0m  ...... ..................[92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. fused_adamcompatible 
............. --------------------------------------------------[93m[NO][0m 
....... [92m[OKAY][0m
fused_lamb ............. cpu_adam[93m[NO][0m  ......................  [92m[OKAY][0m[92m[YES][0m
 ...... [92m[OKAY][0m
sparse_attnfused_adam  .........................  [93m[NO][0m[93m[NO][0m  ....... .......[92m[OKAY][0m 
[92m[OKAY][0m
transformer ............fused_lamb  [93m[NO][0m............. .......  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inferencetransformer_inference .. [93m[NO][0m  .........  [93m[NO][0m [92m[OKAY][0m.......
 [92m[OKAY][0m
utils .................. [92m[YES][0m utils......  ..................[92m[OKAY][0m 
[92m[YES][0m ......quantizer  [92m[OKAY][0m..............
 [93m[NO][0m ....... quantizer[92m[OKAY][0m 
.............. --------------------------------------------------[93m[NO][0m
 ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m utils.......  ..................[92m[OKAY][0m 
[92m[YES][0m ...... [92m[OKAY][0m
utils quantizer..................  ..............[92m[YES][0m  [93m[NO][0m......  .......[92m[OKAY][0m 
[92m[OKAY][0m
quantizer-------------------------------------------------- 
.............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
> initializing torch distributed ...
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninjaninja  ....................................  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

op nameop name  ................................  installedinstalled  ....  compatiblecompatible

----------------------------------------------------------------------------------------------------

cpu_adamcpu_adam  ..............................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

fused_adamfused_adam  ..........................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

fused_lambfused_lamb  ..........................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

sparse_attnsparse_attn  ........................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

transformertransformer  ........................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

stochastic_transformerstochastic_transformer  ..  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils ..................transformer_inference  [92m[YES][0m..  ......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
quantizer .............. [93m[NO][0m utils.......  [92m[OKAY][0m..................
 [92m[YES][0m ...... --------------------------------------------------[92m[OKAY][0m

quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... async_io[93m[NO][0m .......  ...............[93m[NO][0m 
[93m[NO][0m ....... [93m[NO][0m
transformer_inference transformer_inference..  ..[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
utils .................. utils[92m[YES][0m  ........................  [92m[YES][0m[92m[OKAY][0m 
...... [92m[OKAY][0m
quantizer .............. quantizer[93m[NO][0m  .....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m--------------------------------------------------

--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... [93m[NO][0m ....... async_io[93m[NO][0m
 ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... utils[92m[OKAY][0m 
.................. [92m[YES][0m ...... [92m[OKAY][0m
utils .................. [92m[YES][0mquantizer  ....................  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
quantizer .............. --------------------------------------------------[93m[NO][0m
 ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
DeepSpeed general environment info:torch install path
 ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']....................
 1.8.1
torch version torch cuda version....................  1.8.1...............
 11.1
torch cuda versionnvcc version  ....................................  11.111.2

nvcc versiondeepspeed install path  ................................  11.2
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed install path
 deepspeed info...........  ................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']0.4.2+bc17042, bc17042, big-science

deepspeed infodeepspeed wheel compiled w.  .........................  0.4.2+bc17042, bc17042, big-sciencetorch 1.8, cuda 11.1

deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report

DeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report--------------------------------------------------

--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
----------------------------------------------------------------------------------------------------
--------------------------------------------------

--------------------------------------------------
JIT compiled ops requires ninjaJIT compiled ops requires ninjaJIT compiled ops requires ninja


JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op nameninja ................  ..................installed  [92m[OKAY][0m..
 compatible
--------------------------------------------------
--------------------------------------------------
ninjaop name  ..................................  installed[92m[OKAY][0mcpu_adam 
 ................. -------------------------------------------------- 
ninjacompatible[92m[YES][0mop name 
 .................. ................ -------------------------------------------------- ......[92m[OKAY][0minstalled

  --------------------------------------------------[92m[OKAY][0m..

 op namecompatible cpu_adam
................ -------------------------------------------------- 
...............installed  fused_adam..[92m[YES][0m   compatible...................
cpu_adam  -------------------------------------------------- [92m[OKAY][0m[93m[NO][0m
...............
  [92m[YES][0m....... ......  [92m[OKAY][0mcpu_adam[92m[OKAY][0m
 
fused_adam...............  [92m[YES][0mfused_lamb ............. ...................   [92m[OKAY][0m[93m[NO][0m[93m[NO][0m
fused_adam   ...........................   [92m[OKAY][0m[92m[OKAY][0m[93m[NO][0m

 fused_adam.......  .............fused_lamb[92m[OKAY][0m  [93m[NO][0m
............. .......  [93m[NO][0m[92m[OKAY][0mfused_lamb
sparse_attn  fused_lamb............. .......  ............[93m[NO][0m .............  [93m[NO][0m [92m[OKAY][0m.......[93m[NO][0m  .......
  .......[92m[OKAY][0m[92m[OKAY][0m 

[92m[OKAY][0m
transformersparse_attn  ........................  [93m[NO][0m[93m[NO][0m  .......sparse_attn ....... [92m[OKAY][0msparse_attn ............ [92m[OKAY][0m
 ............
[93m[NO][0m  [93m[NO][0m....... stochastic_transformer .......transformer[92m[OKAY][0m   
[92m[OKAY][0m.............
  transformer[93m[NO][0m[93m[NO][0m  transformer ................... .......  ............[93m[NO][0m [92m[OKAY][0m  [92m[OKAY][0m.......
[93m[NO][0m
  [92m[OKAY][0m.......
 [92m[OKAY][0mstochastic_transformer
 stochastic_transformer .stochastic_transformer.   [93m[NO][0m.[93m[NO][0m   .......[93m[NO][0m ....... [92m[OKAY][0m .......
 [92m[OKAY][0m[92m[OKAY][0m

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io ...............  [93m[NO][0m...............  .......[93m[NO][0m  .......[93m[NO][0m
 [93m[NO][0m
transformer_inferencetransformer_inference  .. ..[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
utils utils..................  ..................[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
quantizer ..............quantizer  [93m[NO][0m..............  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ...............async_io [93m[NO][0m  ......................  [93m[NO][0m[93m[NO][0m
 ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m .......transformer_inference  [92m[OKAY][0m..
 [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ......utils  [92m[OKAY][0m..................
 [92m[YES][0m ...... quantizer[92m[OKAY][0m 
.............. [93m[NO][0m quantizer.......  ..............[92m[OKAY][0m 
[93m[NO][0m .......-------------------------------------------------- 
[92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report--------------------------------------------------
----------------------------------------------------------------------------------------------------

--------------------------------------------------DeepSpeed C++/CUDA extension op report
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


DeepSpeed C++/CUDA extension op report----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report


JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------
--------------------------------------------------


--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


JIT compiled ops requires ninja--------------------------------------------------

--------------------------------------------------JIT compiled ops requires ninja

JIT compiled ops requires ninja
ninjaninjaninjaninja   .................................... ..................  [92m[OKAY][0m[92m[OKAY][0m ..................
[92m[OKAY][0m
-------------------------------------------------- 

--------------------------------------------------[92m[OKAY][0m--------------------------------------------------op name

 
op name................--------------------------------------------------  
................op nameinstalled op name  installed  ....................................    compatibleinstalledcompatibleinstalled 
.. -------------------------------------------------- ..

 compatiblecompatible--------------------------------------------------


--------------------------------------------------
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0mcpu_adamcpu_adam
cpu_adam   ..............................  [92m[YES][0m[92m[YES][0m ............... ...... ...... [92m[YES][0m [92m[OKAY][0m [92m[OKAY][0mfused_adam

 ...................  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
fused_adamfused_adamfused_lamb   .......................................   [93m[NO][0m[93m[NO][0m[93m[NO][0m   ..................... fused_adam  [92m[OKAY][0m[92m[OKAY][0m 
[92m[OKAY][0m

.............fused_lambfused_lamb   [93m[NO][0m..........................  [93m[NO][0m[93m[NO][0m  sparse_attn  .................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[93m[NO][0m

 .......
 [92m[OKAY][0m
fused_lamb ............. transformer[93m[NO][0m  ...................  [92m[OKAY][0msparse_attn[93m[NO][0m
sparse_attn   ...............................   [93m[NO][0m[92m[OKAY][0m[93m[NO][0m 
 ..............  [92m[OKAY][0m[92m[OKAY][0m

stochastic_transformersparse_attn transformertransformer .   ........................ [93m[NO][0m............  [93m[NO][0m....... [93m[NO][0m .......  [93m[NO][0m[92m[OKAY][0m .......
 [92m[OKAY][0m 
[92m[OKAY][0m
stochastic_transformerstochastic_transformer  ....... ..[92m[OKAY][0m  [93m[NO][0m[93m[NO][0m  .......
 .......[92m[OKAY][0m transformer
[92m[OKAY][0m 
............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc version nvcc version.....................  .....................11.2 
11.2deepspeed install path
 deepspeed install path...........  ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info deepspeed info...................  ...................0.4.2+bc17042, bc17042, big-science 
0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w.
 deepspeed wheel compiled w.......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path ...............  ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version torch version....................  ....................1.8.1 
1.8.1
torch cuda version torch cuda version...............  ...............11.1 
11.1nvcc version
 nvcc version.....................  .....................11.2 
11.2deepspeed install path
 deepspeed install path...........  ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info
 deepspeed info...................  ...................0.4.2+bc17042, bc17042, big-science 
0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w.
 deepspeed wheel compiled w.......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io ............... [93m[NO][0m ....... [93m[NO][0m
 ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
transformer_inference ..quantizer  [93m[NO][0m..............  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
--------------------------------------------------
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... async_io[93m[NO][0m
 ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m .......transformer_inference  [92m[OKAY][0m..
 [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... utils[92m[OKAY][0m 
.................. [92m[YES][0m quantizer......  ..............[92m[OKAY][0m 
[93m[NO][0m ....... quantizer[92m[OKAY][0m 
.............. [93m[NO][0m --------------------------------------------------.......
 [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... DeepSpeed general environment info:
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version torch install path....................  1.8.1...............
 torch cuda version ............... 11.1
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']nvcc version
 ..................... 11.2torch version
 ....................deepspeed install path  1.8.1...........
 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']torch cuda version
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
 ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
> initializing tensor model parallel with size 4
> initializing pipeline model parallel with size 8
> setting random seeds to 42 ...
[2021-09-24 05:52:24,592] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 2760 and data parallel seed: 42
> compiling dataset index builder ...
make: Entering directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/data'
make: Nothing to be done for 'default'.
make: Leaving directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/data'
>>> done with dataset index builder. Compilation time: 0.299 seconds
> compiling and loading fused kernels ...
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
Detected CUDA files, patching ldflags
Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/fused_kernels/build/build.ninja...
Building extension module scaled_upper_triang_masked_softmax_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module scaled_upper_triang_masked_softmax_cuda...
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
Detected CUDA files, patching ldflags
Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/fused_kernels/build/build.ninja...
Building extension module scaled_masked_softmax_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module scaled_masked_softmax_cuda...
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
Detected CUDA files, patching ldflags
Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/fused_kernels/build/build.ninja...
Building extension module fused_mix_prec_layer_norm_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module fused_mix_prec_layer_norm_cuda...
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
>>> done with compiling and loading fused kernels. Compilation time: 19.795 seconds
time to initialize megatron (seconds): 12.702
[after megatron is initialized] datetime: 2021-09-24 05:52:44 
building GPT model ...
[2021-09-24 05:52:44,769] [INFO] [utils.py:680:see_memory_usage] Before Building Model
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/cuda/memory.py:373: FutureWarning: torch.cuda.memory_cached has been renamed to torch.cuda.memory_reserved
  warnings.warn(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/cuda/memory.py:381: FutureWarning: torch.cuda.max_memory_cached has been renamed to torch.cuda.max_memory_reserved
  warnings.warn(
[2021-09-24 05:52:44,770] [INFO] [utils.py:681:see_memory_usage] MA 0.0 GB         Max_MA 0.0 GB         CA 0.0 GB         Max_CA 0 GB 
[2021-09-24 05:52:44,770] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory:  used = 37.83 GB, percent = 20.2%
SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None
Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=0, model=1): 1, ProcessCoord(pipe=0, data=0, model=2): 2, ProcessCoord(pipe=0, data=0, model=3): 3, ProcessCoord(pipe=0, data=1, model=0): 4, ProcessCoord(pipe=0, data=1, model=1): 5, ProcessCoord(pipe=0, data=1, model=2): 6, ProcessCoord(pipe=0, data=1, model=3): 7, ProcessCoord(pipe=0, data=2, model=0): 8, ProcessCoord(pipe=0, data=2, model=1): 9, ProcessCoord(pipe=0, data=2, model=2): 10, ProcessCoord(pipe=0, data=2, model=3): 11, ProcessCoord(pipe=0, data=3, model=0): 12, ProcessCoord(pipe=0, data=3, model=1): 13, ProcessCoord(pipe=0, data=3, model=2): 14, ProcessCoord(pipe=0, data=3, model=3): 15, ProcessCoord(pipe=0, data=4, model=0): 16, ProcessCoord(pipe=0, data=4, model=1): 17, ProcessCoord(pipe=0, data=4, model=2): 18, ProcessCoord(pipe=0, data=4, model=3): 19, ProcessCoord(pipe=0, data=5, model=0): 20, ProcessCoord(pipe=0, data=5, model=1): 21, ProcessCoord(pipe=0, data=5, model=2): 22, ProcessCoord(pipe=0, data=5, model=3): 23, ProcessCoord(pipe=0, data=6, model=0): 24, ProcessCoord(pipe=0, data=6, model=1): 25, ProcessCoord(pipe=0, data=6, model=2): 26, ProcessCoord(pipe=0, data=6, model=3): 27, ProcessCoord(pipe=0, data=7, model=0): 28, ProcessCoord(pipe=0, data=7, model=1): 29, ProcessCoord(pipe=0, data=7, model=2): 30, ProcessCoord(pipe=0, data=7, model=3): 31, ProcessCoord(pipe=1, data=0, model=0): 32, ProcessCoord(pipe=1, data=0, model=1): 33, ProcessCoord(pipe=1, data=0, model=2): 34, ProcessCoord(pipe=1, data=0, model=3): 35, ProcessCoord(pipe=1, data=1, model=0): 36, ProcessCoord(pipe=1, data=1, model=1): 37, ProcessCoord(pipe=1, data=1, model=2): 38, ProcessCoord(pipe=1, data=1, model=3): 39, ProcessCoord(pipe=1, data=2, model=0): 40, ProcessCoord(pipe=1, data=2, model=1): 41, ProcessCoord(pipe=1, data=2, model=2): 42, ProcessCoord(pipe=1, data=2, model=3): 43, ProcessCoord(pipe=1, data=3, model=0): 44, ProcessCoord(pipe=1, data=3, model=1): 45, ProcessCoord(pipe=1, data=3, model=2): 46, ProcessCoord(pipe=1, data=3, model=3): 47, ProcessCoord(pipe=1, data=4, model=0): 48, ProcessCoord(pipe=1, data=4, model=1): 49, ProcessCoord(pipe=1, data=4, model=2): 50, ProcessCoord(pipe=1, data=4, model=3): 51, ProcessCoord(pipe=1, data=5, model=0): 52, ProcessCoord(pipe=1, data=5, model=1): 53, ProcessCoord(pipe=1, data=5, model=2): 54, ProcessCoord(pipe=1, data=5, model=3): 55, ProcessCoord(pipe=1, data=6, model=0): 56, ProcessCoord(pipe=1, data=6, model=1): 57, ProcessCoord(pipe=1, data=6, model=2): 58, ProcessCoord(pipe=1, data=6, model=3): 59, ProcessCoord(pipe=1, data=7, model=0): 60, ProcessCoord(pipe=1, data=7, model=1): 61, ProcessCoord(pipe=1, data=7, model=2): 62, ProcessCoord(pipe=1, data=7, model=3): 63, ProcessCoord(pipe=2, data=0, model=0): 64, ProcessCoord(pipe=2, data=0, model=1): 65, ProcessCoord(pipe=2, data=0, model=2): 66, ProcessCoord(pipe=2, data=0, model=3): 67, ProcessCoord(pipe=2, data=1, model=0): 68, ProcessCoord(pipe=2, data=1, model=1): 69, ProcessCoord(pipe=2, data=1, model=2): 70, ProcessCoord(pipe=2, data=1, model=3): 71, ProcessCoord(pipe=2, data=2, model=0): 72, ProcessCoord(pipe=2, data=2, model=1): 73, ProcessCoord(pipe=2, data=2, model=2): 74, ProcessCoord(pipe=2, data=2, model=3): 75, ProcessCoord(pipe=2, data=3, model=0): 76, ProcessCoord(pipe=2, data=3, model=1): 77, ProcessCoord(pipe=2, data=3, model=2): 78, ProcessCoord(pipe=2, data=3, model=3): 79, ProcessCoord(pipe=2, data=4, model=0): 80, ProcessCoord(pipe=2, data=4, model=1): 81, ProcessCoord(pipe=2, data=4, model=2): 82, ProcessCoord(pipe=2, data=4, model=3): 83, ProcessCoord(pipe=2, data=5, model=0): 84, ProcessCoord(pipe=2, data=5, model=1): 85, ProcessCoord(pipe=2, data=5, model=2): 86, ProcessCoord(pipe=2, data=5, model=3): 87, ProcessCoord(pipe=2, data=6, model=0): 88, ProcessCoord(pipe=2, data=6, model=1): 89, ProcessCoord(pipe=2, data=6, model=2): 90, ProcessCoord(pipe=2, data=6, model=3): 91, ProcessCoord(pipe=2, data=7, model=0): 92, ProcessCoord(pipe=2, data=7, model=1): 93, ProcessCoord(pipe=2, data=7, model=2): 94, ProcessCoord(pipe=2, data=7, model=3): 95, ProcessCoord(pipe=3, data=0, model=0): 96, ProcessCoord(pipe=3, data=0, model=1): 97, ProcessCoord(pipe=3, data=0, model=2): 98, ProcessCoord(pipe=3, data=0, model=3): 99, ProcessCoord(pipe=3, data=1, model=0): 100, ProcessCoord(pipe=3, data=1, model=1): 101, ProcessCoord(pipe=3, data=1, model=2): 102, ProcessCoord(pipe=3, data=1, model=3): 103, ProcessCoord(pipe=3, data=2, model=0): 104, ProcessCoord(pipe=3, data=2, model=1): 105, ProcessCoord(pipe=3, data=2, model=2): 106, ProcessCoord(pipe=3, data=2, model=3): 107, ProcessCoord(pipe=3, data=3, model=0): 108, ProcessCoord(pipe=3, data=3, model=1): 109, ProcessCoord(pipe=3, data=3, model=2): 110, ProcessCoord(pipe=3, data=3, model=3): 111, ProcessCoord(pipe=3, data=4, model=0): 112, ProcessCoord(pipe=3, data=4, model=1): 113, ProcessCoord(pipe=3, data=4, model=2): 114, ProcessCoord(pipe=3, data=4, model=3): 115, ProcessCoord(pipe=3, data=5, model=0): 116, ProcessCoord(pipe=3, data=5, model=1): 117, ProcessCoord(pipe=3, data=5, model=2): 118, ProcessCoord(pipe=3, data=5, model=3): 119, ProcessCoord(pipe=3, data=6, model=0): 120, ProcessCoord(pipe=3, data=6, model=1): 121, ProcessCoord(pipe=3, data=6, model=2): 122, ProcessCoord(pipe=3, data=6, model=3): 123, ProcessCoord(pipe=3, data=7, model=0): 124, ProcessCoord(pipe=3, data=7, model=1): 125, ProcessCoord(pipe=3, data=7, model=2): 126, ProcessCoord(pipe=3, data=7, model=3): 127, ProcessCoord(pipe=4, data=0, model=0): 128, ProcessCoord(pipe=4, data=0, model=1): 129, ProcessCoord(pipe=4, data=0, model=2): 130, ProcessCoord(pipe=4, data=0, model=3): 131, ProcessCoord(pipe=4, data=1, model=0): 132, ProcessCoord(pipe=4, data=1, model=1): 133, ProcessCoord(pipe=4, data=1, model=2): 134, ProcessCoord(pipe=4, data=1, model=3): 135, ProcessCoord(pipe=4, data=2, model=0): 136, ProcessCoord(pipe=4, data=2, model=1): 137, ProcessCoord(pipe=4, data=2, model=2): 138, ProcessCoord(pipe=4, data=2, model=3): 139, ProcessCoord(pipe=4, data=3, model=0): 140, ProcessCoord(pipe=4, data=3, model=1): 141, ProcessCoord(pipe=4, data=3, model=2): 142, ProcessCoord(pipe=4, data=3, model=3): 143, ProcessCoord(pipe=4, data=4, model=0): 144, ProcessCoord(pipe=4, data=4, model=1): 145, ProcessCoord(pipe=4, data=4, model=2): 146, ProcessCoord(pipe=4, data=4, model=3): 147, ProcessCoord(pipe=4, data=5, model=0): 148, ProcessCoord(pipe=4, data=5, model=1): 149, ProcessCoord(pipe=4, data=5, model=2): 150, ProcessCoord(pipe=4, data=5, model=3): 151, ProcessCoord(pipe=4, data=6, model=0): 152, ProcessCoord(pipe=4, data=6, model=1): 153, ProcessCoord(pipe=4, data=6, model=2): 154, ProcessCoord(pipe=4, data=6, model=3): 155, ProcessCoord(pipe=4, data=7, model=0): 156, ProcessCoord(pipe=4, data=7, model=1): 157, ProcessCoord(pipe=4, data=7, model=2): 158, ProcessCoord(pipe=4, data=7, model=3): 159, ProcessCoord(pipe=5, data=0, model=0): 160, ProcessCoord(pipe=5, data=0, model=1): 161, ProcessCoord(pipe=5, data=0, model=2): 162, ProcessCoord(pipe=5, data=0, model=3): 163, ProcessCoord(pipe=5, data=1, model=0): 164, ProcessCoord(pipe=5, data=1, model=1): 165, ProcessCoord(pipe=5, data=1, model=2): 166, ProcessCoord(pipe=5, data=1, model=3): 167, ProcessCoord(pipe=5, data=2, model=0): 168, ProcessCoord(pipe=5, data=2, model=1): 169, ProcessCoord(pipe=5, data=2, model=2): 170, ProcessCoord(pipe=5, data=2, model=3): 171, ProcessCoord(pipe=5, data=3, model=0): 172, ProcessCoord(pipe=5, data=3, model=1): 173, ProcessCoord(pipe=5, data=3, model=2): 174, ProcessCoord(pipe=5, data=3, model=3): 175, ProcessCoord(pipe=5, data=4, model=0): 176, ProcessCoord(pipe=5, data=4, model=1): 177, ProcessCoord(pipe=5, data=4, model=2): 178, ProcessCoord(pipe=5, data=4, model=3): 179, ProcessCoord(pipe=5, data=5, model=0): 180, ProcessCoord(pipe=5, data=5, model=1): 181, ProcessCoord(pipe=5, data=5, model=2): 182, ProcessCoord(pipe=5, data=5, model=3): 183, ProcessCoord(pipe=5, data=6, model=0): 184, ProcessCoord(pipe=5, data=6, model=1): 185, ProcessCoord(pipe=5, data=6, model=2): 186, ProcessCoord(pipe=5, data=6, model=3): 187, ProcessCoord(pipe=5, data=7, model=0): 188, ProcessCoord(pipe=5, data=7, model=1): 189, ProcessCoord(pipe=5, data=7, model=2): 190, ProcessCoord(pipe=5, data=7, model=3): 191, ProcessCoord(pipe=6, data=0, model=0): 192, ProcessCoord(pipe=6, data=0, model=1): 193, ProcessCoord(pipe=6, data=0, model=2): 194, ProcessCoord(pipe=6, data=0, model=3): 195, ProcessCoord(pipe=6, data=1, model=0): 196, ProcessCoord(pipe=6, data=1, model=1): 197, ProcessCoord(pipe=6, data=1, model=2): 198, ProcessCoord(pipe=6, data=1, model=3): 199, ProcessCoord(pipe=6, data=2, model=0): 200, ProcessCoord(pipe=6, data=2, model=1): 201, ProcessCoord(pipe=6, data=2, model=2): 202, ProcessCoord(pipe=6, data=2, model=3): 203, ProcessCoord(pipe=6, data=3, model=0): 204, ProcessCoord(pipe=6, data=3, model=1): 205, ProcessCoord(pipe=6, data=3, model=2): 206, ProcessCoord(pipe=6, data=3, model=3): 207, ProcessCoord(pipe=6, data=4, model=0): 208, ProcessCoord(pipe=6, data=4, model=1): 209, ProcessCoord(pipe=6, data=4, model=2): 210, ProcessCoord(pipe=6, data=4, model=3): 211, ProcessCoord(pipe=6, data=5, model=0): 212, ProcessCoord(pipe=6, data=5, model=1): 213, ProcessCoord(pipe=6, data=5, model=2): 214, ProcessCoord(pipe=6, data=5, model=3): 215, ProcessCoord(pipe=6, data=6, model=0): 216, ProcessCoord(pipe=6, data=6, model=1): 217, ProcessCoord(pipe=6, data=6, model=2): 218, ProcessCoord(pipe=6, data=6, model=3): 219, ProcessCoord(pipe=6, data=7, model=0): 220, ProcessCoord(pipe=6, data=7, model=1): 221, ProcessCoord(pipe=6, data=7, model=2): 222, ProcessCoord(pipe=6, data=7, model=3): 223, ProcessCoord(pipe=7, data=0, model=0): 224, ProcessCoord(pipe=7, data=0, model=1): 225, ProcessCoord(pipe=7, data=0, model=2): 226, ProcessCoord(pipe=7, data=0, model=3): 227, ProcessCoord(pipe=7, data=1, model=0): 228, ProcessCoord(pipe=7, data=1, model=1): 229, ProcessCoord(pipe=7, data=1, model=2): 230, ProcessCoord(pipe=7, data=1, model=3): 231, ProcessCoord(pipe=7, data=2, model=0): 232, ProcessCoord(pipe=7, data=2, model=1): 233, ProcessCoord(pipe=7, data=2, model=2): 234, ProcessCoord(pipe=7, data=2, model=3): 235, ProcessCoord(pipe=7, data=3, model=0): 236, ProcessCoord(pipe=7, data=3, model=1): 237, ProcessCoord(pipe=7, data=3, model=2): 238, ProcessCoord(pipe=7, data=3, model=3): 239, ProcessCoord(pipe=7, data=4, model=0): 240, ProcessCoord(pipe=7, data=4, model=1): 241, ProcessCoord(pipe=7, data=4, model=2): 242, ProcessCoord(pipe=7, data=4, model=3): 243, ProcessCoord(pipe=7, data=5, model=0): 244, ProcessCoord(pipe=7, data=5, model=1): 245, ProcessCoord(pipe=7, data=5, model=2): 246, ProcessCoord(pipe=7, data=5, model=3): 247, ProcessCoord(pipe=7, data=6, model=0): 248, ProcessCoord(pipe=7, data=6, model=1): 249, ProcessCoord(pipe=7, data=6, model=2): 250, ProcessCoord(pipe=7, data=6, model=3): 251, ProcessCoord(pipe=7, data=7, model=0): 252, ProcessCoord(pipe=7, data=7, model=1): 253, ProcessCoord(pipe=7, data=7, model=2): 254, ProcessCoord(pipe=7, data=7, model=3): 255}
[2021-09-24 05:52:46,176] [INFO] [module.py:360:_partition_layers] Partitioning pipeline stages with method type:transformer
stage=0 layers=7
     0: _to_float16
     1: EmbeddingPipe
     2: <lambda>
     3: ParallelTransformerLayerPipe
     4: ParallelTransformerLayerPipe
     5: ParallelTransformerLayerPipe
     6: ParallelTransformerLayerPipe
stage=1 layers=4
     7: ParallelTransformerLayerPipe
     8: ParallelTransformerLayerPipe
     9: ParallelTransformerLayerPipe
    10: ParallelTransformerLayerPipe
stage=2 layers=4
    11: ParallelTransformerLayerPipe
    12: ParallelTransformerLayerPipe
    13: ParallelTransformerLayerPipe
    14: ParallelTransformerLayerPipe
stage=3 layers=4
    15: ParallelTransformerLayerPipe
    16: ParallelTransformerLayerPipe
    17: ParallelTransformerLayerPipe
    18: ParallelTransformerLayerPipe
stage=4 layers=4
    19: ParallelTransformerLayerPipe
    20: ParallelTransformerLayerPipe
    21: ParallelTransformerLayerPipe
    22: ParallelTransformerLayerPipe
stage=5 layers=4
    23: ParallelTransformerLayerPipe
    24: ParallelTransformerLayerPipe
    25: ParallelTransformerLayerPipe
    26: ParallelTransformerLayerPipe
stage=6 layers=4
    27: ParallelTransformerLayerPipe
    28: ParallelTransformerLayerPipe
    29: ParallelTransformerLayerPipe
    30: ParallelTransformerLayerPipe
stage=7 layers=8
    31: ParallelTransformerLayerPipe
    32: ParallelTransformerLayerPipe
    33: ParallelTransformerLayerPipe
    34: ParallelTransformerLayerPipe
    35: <lambda>
    36: MixedFusedLayerNorm
    37: EmbeddingPipe
    38: float16_to_fp32
  loss: CrossEntropy
 > number of parameters on (tensor, pipeline) model parallel rank (2, 1): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (2, 3): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (3, 1): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (3, 5): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (1, 5): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (0, 5): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (2, 4): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (0, 4): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (1, 6): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (0, 6): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (2, 6): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (3, 6): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (1, 3): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (0, 3): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (3, 3): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (2, 2): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (3, 2): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (1, 2): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (2, 5): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (0, 1): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (1, 1): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (3, 4): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (0, 2): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (1, 4): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (0, 7): 1986498560
 > number of parameters on (tensor, pipeline) model parallel rank (2, 0): 1986465792
 > number of parameters on (tensor, pipeline) model parallel rank (1, 7): 1986498560
 > number of parameters on (tensor, pipeline) model parallel rank (1, 0): 1986465792
 > number of parameters on (tensor, pipeline) model parallel rank (3, 7): 1986498560
 > number of parameters on (tensor, pipeline) model parallel rank (3, 0): 1986465792
 > number of parameters on (tensor, pipeline) model parallel rank (2, 7): 1986498560
[2021-09-24 05:52:47,386] [INFO] [utils.py:680:see_memory_usage] After Building Model
[2021-09-24 05:52:47,387] [INFO] [utils.py:681:see_memory_usage] MA 3.77 GB         Max_MA 3.79 GB         CA 3.79 GB         Max_CA 4 GB 
[2021-09-24 05:52:47,388] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory:  used = 38.02 GB, percent = 20.3%
 > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 1986465792
setting training iterations to 159576
> learning rate decay style: cosine
DeepSpeed is enabled.
[2021-09-24 05:52:47,464] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.4.2+bc17042, git-hash=bc17042, git-branch=big-science
[2021-09-24 05:52:47,544] [INFO] [engine.py:179:__init__] DeepSpeed Flops Profiler Enabled: False
[2021-09-24 05:52:47,544] [INFO] [engine.py:736:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer
[2021-09-24 05:52:47,544] [INFO] [engine.py:741:_configure_optimizer] Using client Optimizer as basic optimizer
[2021-09-24 05:52:47,545] [INFO] [engine.py:750:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam
[2021-09-24 05:52:47,545] [INFO] [utils.py:43:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type=<class 'apex.optimizers.fused_adam.FusedAdam'>
[2021-09-24 05:52:47,545] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer
[2021-09-24 05:52:47,545] [INFO] [stage2.py:106:__init__] Reduce bucket size 500000000
[2021-09-24 05:52:47,545] [INFO] [stage2.py:107:__init__] Allgather bucket size 500000000
[2021-09-24 05:52:47,545] [INFO] [stage2.py:108:__init__] CPU Offload: False
[2021-09-24 05:52:47,545] [INFO] [stage2.py:109:__init__] Round robin gradient partitioning: False
[2021-09-24 05:52:52,071] [INFO] [stage2.py:419:__init__] optimizer state initialized
[2021-09-24 05:52:52,071] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam
[2021-09-24 05:52:52,071] [INFO] [engine.py:553:_configure_lr_scheduler] DeepSpeed using client LR scheduler
[2021-09-24 05:52:52,071] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = <megatron.learning_rates.AnnealingLR object at 0x14c7bc9a6e80>
[2021-09-24 05:52:52,072] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)]
[2021-09-24 05:52:52,072] [INFO] [config.py:900:print] DeepSpeedEngine configuration:
[2021-09-24 05:52:52,072] [INFO] [config.py:904:print]   activation_checkpointing_config  {
    "partition_activations": false, 
    "contiguous_memory_optimization": false, 
    "cpu_checkpointing": false, 
    "number_checkpoints": null, 
    "synchronize_checkpoint_boundary": false, 
    "profile": false
}
[2021-09-24 05:52:52,072] [INFO] [config.py:904:print]   aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}
[2021-09-24 05:52:52,072] [INFO] [config.py:904:print]   allreduce_always_fp32 ........ False
[2021-09-24 05:52:52,072] [INFO] [config.py:904:print]   amp_enabled .................. False
[2021-09-24 05:52:52,072] [INFO] [config.py:904:print]   amp_params ................... False
[2021-09-24 05:52:52,072] [INFO] [config.py:904:print]   checkpoint_tag_validation_enabled  True
[2021-09-24 05:52:52,072] [INFO] [config.py:904:print]   checkpoint_tag_validation_fail  False
[2021-09-24 05:52:52,072] [INFO] [config.py:904:print]   disable_allgather ............ False
[2021-09-24 05:52:52,072] [INFO] [config.py:904:print]   dump_state ................... False
[2021-09-24 05:52:52,072] [INFO] [config.py:904:print]   dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1}
[2021-09-24 05:52:52,072] [INFO] [config.py:904:print]   eigenvalue_enabled ........... False
[2021-09-24 05:52:52,072] [INFO] [config.py:904:print]   eigenvalue_gas_boundary_resolution  1
[2021-09-24 05:52:52,072] [INFO] [config.py:904:print]   eigenvalue_layer_name ........ bert.encoder.layer
[2021-09-24 05:52:52,072] [INFO] [config.py:904:print]   eigenvalue_layer_num ......... 0
[2021-09-24 05:52:52,072] [INFO] [config.py:904:print]   eigenvalue_max_iter .......... 100
[2021-09-24 05:52:52,072] [INFO] [config.py:904:print]   eigenvalue_stability ......... 1e-06
[2021-09-24 05:52:52,072] [INFO] [config.py:904:print]   eigenvalue_tol ............... 0.01
[2021-09-24 05:52:52,072] [INFO] [config.py:904:print]   eigenvalue_verbose ........... False
[2021-09-24 05:52:52,073] [INFO] [config.py:904:print]   elasticity_enabled ........... False
[2021-09-24 05:52:52,073] [INFO] [config.py:904:print]   flops_profiler_config ........ {
    "enabled": false, 
    "profile_step": 1, 
    "module_depth": -1, 
    "top_modules": 1, 
    "detailed": true, 
    "output_file": null
}
[2021-09-24 05:52:52,073] [INFO] [config.py:904:print]   fp16_enabled ................. True
[2021-09-24 05:52:52,073] [INFO] [config.py:904:print]   fp16_mixed_quantize .......... False
[2021-09-24 05:52:52,073] [INFO] [config.py:904:print]   global_rank .................. 0
[2021-09-24 05:52:52,073] [INFO] [config.py:904:print]   gradient_accumulation_steps .. 256
[2021-09-24 05:52:52,073] [INFO] [config.py:904:print]   gradient_clipping ............ 1.0
[2021-09-24 05:52:52,073] [INFO] [config.py:904:print]   gradient_predivide_factor .... 1.0
[2021-09-24 05:52:52,073] [INFO] [config.py:904:print]   initial_dynamic_scale ........ 4096
[2021-09-24 05:52:52,073] [INFO] [config.py:904:print]   loss_scale ................... 0
[2021-09-24 05:52:52,073] [INFO] [config.py:904:print]   memory_breakdown ............. False
[2021-09-24 05:52:52,073] [INFO] [config.py:904:print]   optimizer_legacy_fusion ...... False
[2021-09-24 05:52:52,073] [INFO] [config.py:904:print]   optimizer_name ............... None
[2021-09-24 05:52:52,073] [INFO] [config.py:904:print]   optimizer_params ............. None
[2021-09-24 05:52:52,073] [INFO] [config.py:904:print]   pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0}
[2021-09-24 05:52:52,073] [INFO] [config.py:904:print]   pld_enabled .................. False
[2021-09-24 05:52:52,073] [INFO] [config.py:904:print]   pld_params ................... False
[2021-09-24 05:52:52,073] [INFO] [config.py:904:print]   prescale_gradients ........... False
[2021-09-24 05:52:52,073] [INFO] [config.py:904:print]   quantize_change_rate ......... 0.001
[2021-09-24 05:52:52,073] [INFO] [config.py:904:print]   quantize_groups .............. 1
[2021-09-24 05:52:52,073] [INFO] [config.py:904:print]   quantize_offset .............. 1000
[2021-09-24 05:52:52,073] [INFO] [config.py:904:print]   quantize_period .............. 1000
[2021-09-24 05:52:52,073] [INFO] [config.py:904:print]   quantize_rounding ............ 0
[2021-09-24 05:52:52,073] [INFO] [config.py:904:print]   quantize_start_bits .......... 16
[2021-09-24 05:52:52,073] [INFO] [config.py:904:print]   quantize_target_bits ......... 8
[2021-09-24 05:52:52,073] [INFO] [config.py:904:print]   quantize_training_enabled .... False
[2021-09-24 05:52:52,073] [INFO] [config.py:904:print]   quantize_type ................ 0
[2021-09-24 05:52:52,073] [INFO] [config.py:904:print]   quantize_verbose ............. False
[2021-09-24 05:52:52,073] [INFO] [config.py:904:print]   scheduler_name ............... None
[2021-09-24 05:52:52,073] [INFO] [config.py:904:print]   scheduler_params ............. None
[2021-09-24 05:52:52,073] [INFO] [config.py:904:print]   sparse_attention ............. None
[2021-09-24 05:52:52,073] [INFO] [config.py:904:print]   sparse_gradients_enabled ..... False
[2021-09-24 05:52:52,073] [INFO] [config.py:904:print]   steps_per_print .............. 2000
[2021-09-24 05:52:52,073] [INFO] [config.py:904:print]   tensorboard_enabled .......... False
[2021-09-24 05:52:52,073] [INFO] [config.py:904:print]   tensorboard_job_name ......... DeepSpeedJobName
[2021-09-24 05:52:52,074] [INFO] [config.py:904:print]   tensorboard_output_path ...... 
[2021-09-24 05:52:52,074] [INFO] [config.py:904:print]   train_batch_size ............. 2048
[2021-09-24 05:52:52,074] [INFO] [config.py:904:print]   train_micro_batch_size_per_gpu  1
[2021-09-24 05:52:52,074] [INFO] [config.py:904:print]   use_quantizer_kernel ......... False
[2021-09-24 05:52:52,074] [INFO] [config.py:904:print]   wall_clock_breakdown ......... False
[2021-09-24 05:52:52,074] [INFO] [config.py:904:print]   world_size ................... 8
[2021-09-24 05:52:52,074] [INFO] [config.py:904:print]   zero_allow_untested_optimizer  False
[2021-09-24 05:52:52,074] [INFO] [config.py:904:print]   zero_config .................. {
    "stage": 1, 
    "contiguous_gradients": false, 
    "reduce_scatter": true, 
    "reduce_bucket_size": 5.000000e+08, 
    "allgather_partitions": true, 
    "allgather_bucket_size": 5.000000e+08, 
    "overlap_comm": false, 
    "load_from_fp32_weights": true, 
    "elastic_checkpoint": true, 
    "offload_param": null, 
    "offload_optimizer": null, 
    "sub_group_size": 1.000000e+09, 
    "prefetch_bucket_size": 5.000000e+07, 
    "param_persistence_threshold": 1.000000e+05, 
    "max_live_parameters": 1.000000e+09, 
    "max_reuse_distance": 1.000000e+09, 
    "gather_fp16_weights_on_model_save": false, 
    "ignore_unused_parameters": true, 
    "round_robin_gradients": false, 
    "legacy_stage1": false
}
[2021-09-24 05:52:52,074] [INFO] [config.py:904:print]   zero_enabled ................. True
[2021-09-24 05:52:52,074] [INFO] [config.py:904:print]   zero_optimization_stage ...... 1
[2021-09-24 05:52:52,074] [INFO] [config.py:906:print]   json = {
    "train_micro_batch_size_per_gpu": 1, 
    "train_batch_size": 2.048000e+03, 
    "gradient_clipping": 1.0, 
    "zero_optimization": {
        "stage": 1
    }, 
    "fp16": {
        "enabled": true, 
        "loss_scale": 0, 
        "loss_scale_window": 500, 
        "hysteresis": 2, 
        "min_loss_scale": 1, 
        "initial_scale_power": 12
    }, 
    "steps_per_print": 2.000000e+03, 
    "wall_clock_breakdown": false
}
[2021-09-24 05:52:52,074] [INFO] [engine.py:76:__init__] CONFIG: micro_batches=256 micro_batch_size=1
[2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=0 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=3 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=1 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=2 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=64 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=66 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=65 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=67 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=195 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=193 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=192 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=194 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=130 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=129 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=128 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=131 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=97 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=96 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=98 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=32 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=35 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=34 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=33 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=160 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=161 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=224 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=227 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=226 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=225 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=99 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=163 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=162 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
 > using checkpoint value 6e-05 for learning rate
 > using checkpoint value 6e-06 for minimum learning rate
 > using checkpoint value 216320 for warmup iterations
 > using checkpoint value 126953125 for total number of iterations
 > using checkpoint value cosine for decay style
successfully loaded 8 ZeRO state_dicts for rank 168
successfully loaded 8 ZeRO state_dicts for rank 171
successfully loaded 8 ZeRO state_dicts for rank 176
successfully loaded 8 ZeRO state_dicts for rank 88
successfully loaded 8 ZeRO state_dicts for rank 170
successfully loaded 8 ZeRO state_dicts for rank 132
successfully loaded 8 ZeRO state_dicts for rank 156
successfully loaded 8 ZeRO state_dicts for rank 169
successfully loaded 8 ZeRO state_dicts for rank 159
successfully loaded 8 ZeRO state_dicts for rank 124
successfully loaded 8 ZeRO state_dicts for rank 32
successfully loaded 8 ZeRO state_dicts for rank 49
successfully loaded 8 ZeRO state_dicts for rank 96
successfully loaded 8 ZeRO state_dicts for rank 167
successfully loaded 8 ZeRO state_dicts for rank 127
successfully loaded 8 ZeRO state_dicts for rank 60
successfully loaded 8 ZeRO state_dicts for rank 148
successfully loaded 8 ZeRO state_dicts for rank 48
successfully loaded 8 ZeRO state_dicts for rank 99
successfully loaded 8 ZeRO state_dicts for rank 140
successfully loaded 8 ZeRO state_dicts for rank 144
successfully loaded 8 ZeRO state_dicts for rank 104
successfully loaded 8 ZeRO state_dicts for rank 112
successfully loaded 8 ZeRO state_dicts for rank 68
successfully loaded 8 ZeRO state_dicts for rank 120
loading 8 zero partition checkpoints for rank 168
successfully loaded 8 ZeRO state_dicts for rank 193
successfully loaded 8 ZeRO state_dicts for rank 210
successfully loaded 8 ZeRO state_dicts for rank 69
successfully loaded 8 ZeRO state_dicts for rank 52
successfully loaded 8 ZeRO state_dicts for rank 157
successfully loaded 8 ZeRO state_dicts for rank 40
successfully loaded 8 ZeRO state_dicts for rank 129
successfully loaded 8 ZeRO state_dicts for rank 201
successfully loaded 8 ZeRO state_dicts for rank 209
successfully loaded 8 ZeRO state_dicts for rank 145
successfully loaded 8 ZeRO state_dicts for rank 111
successfully loaded 8 ZeRO state_dicts for rank 211
successfully loaded 8 ZeRO state_dicts for rank 135
successfully loaded 8 ZeRO state_dicts for rank 141
successfully loaded 8 ZeRO state_dicts for rank 139
successfully loaded 8 ZeRO state_dicts for rank 172
successfully loaded 8 ZeRO state_dicts for rank 80
successfully loaded 8 ZeRO state_dicts for rank 215
successfully loaded 8 ZeRO state_dicts for rank 106
successfully loaded 8 ZeRO state_dicts for rank 187
successfully loaded 8 ZeRO state_dicts for rank 137
successfully loaded 8 ZeRO state_dicts for rank 133
successfully loaded 8 ZeRO state_dicts for rank 90
successfully loaded 8 ZeRO state_dicts for rank 74
successfully loaded 8 ZeRO state_dicts for rank 34
successfully loaded 8 ZeRO state_dicts for rank 143
successfully loaded 8 ZeRO state_dicts for rank 200
successfully loaded 8 ZeRO state_dicts for rank 122
successfully loaded 8 ZeRO state_dicts for rank 125
successfully loaded 8 ZeRO state_dicts for rank 228
successfully loaded 8 ZeRO state_dicts for rank 81
successfully loaded 8 ZeRO state_dicts for rank 105
successfully loaded 8 ZeRO state_dicts for rank 163
successfully loaded 8 ZeRO state_dicts for rank 64
successfully loaded 8 ZeRO state_dicts for rank 186
successfully loaded 8 ZeRO state_dicts for rank 97
successfully loaded 8 ZeRO state_dicts for rank 70
successfully loaded 8 ZeRO state_dicts for rank 51
successfully loaded 8 ZeRO state_dicts for rank 77
successfully loaded 8 ZeRO state_dicts for rank 160
successfully loaded 8 ZeRO state_dicts for rank 50
successfully loaded 8 ZeRO state_dicts for rank 202
successfully loaded 8 ZeRO state_dicts for rank 98
successfully loaded 8 ZeRO state_dicts for rank 20
successfully loaded 8 ZeRO state_dicts for rank 85
successfully loaded 8 ZeRO state_dicts for rank 89
successfully loaded 8 ZeRO state_dicts for rank 214
successfully loaded 8 ZeRO state_dicts for rank 114
successfully loaded 8 ZeRO state_dicts for rank 149
successfully loaded 8 ZeRO state_dicts for rank 123
successfully loaded 8 ZeRO state_dicts for rank 71
successfully loaded 8 ZeRO state_dicts for rank 126
successfully loaded 8 ZeRO state_dicts for rank 152
successfully loaded 8 ZeRO state_dicts for rank 203
successfully loaded 8 ZeRO state_dicts for rank 166
successfully loaded 8 ZeRO state_dicts for rank 41
successfully loaded 8 ZeRO state_dicts for rank 222
successfully loaded 8 ZeRO state_dicts for rank 130
successfully loaded 8 ZeRO state_dicts for rank 216
successfully loaded 8 ZeRO state_dicts for rank 84
successfully loaded 8 ZeRO state_dicts for rank 100
successfully loaded 8 ZeRO state_dicts for rank 42
successfully loaded 8 ZeRO state_dicts for rank 190
successfully loaded 8 ZeRO state_dicts for rank 12
successfully loaded 8 ZeRO state_dicts for rank 44
successfully loaded 8 ZeRO state_dicts for rank 108
successfully loaded 8 ZeRO state_dicts for rank 219
successfully loaded 8 ZeRO state_dicts for rank 206
successfully loaded 8 ZeRO state_dicts for rank 128
successfully loaded 8 ZeRO state_dicts for rank 37
successfully loaded 8 ZeRO state_dicts for rank 33
successfully loaded 8 ZeRO state_dicts for rank 56
successfully loaded 8 ZeRO state_dicts for rank 62
successfully loaded 8 ZeRO state_dicts for rank 115
successfully loaded 8 ZeRO state_dicts for rank 24
successfully loaded 8 ZeRO state_dicts for rank 45
successfully loaded 8 ZeRO state_dicts for rank 192
successfully loaded 8 ZeRO state_dicts for rank 153
successfully loaded 8 ZeRO state_dicts for rank 134
successfully loaded 8 ZeRO state_dicts for rank 136
successfully loaded 8 ZeRO state_dicts for rank 38
successfully loaded 8 ZeRO state_dicts for rank 131
successfully loaded 8 ZeRO state_dicts for rank 121
WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-24 05:53:20 CEST)" was missed by 0:00:03.058626
successfully loaded 8 ZeRO state_dicts for rank 217
successfully loaded 8 ZeRO state_dicts for rank 146
successfully loaded 8 ZeRO state_dicts for rank 195
successfully loaded 8 ZeRO state_dicts for rank 82
successfully loaded 8 ZeRO state_dicts for rank 191
successfully loaded 8 ZeRO state_dicts for rank 113
successfully loaded 8 ZeRO state_dicts for rank 158
successfully loaded 8 ZeRO state_dicts for rank 208
loading 8 zero partition checkpoints for rank 176
successfully loaded 8 ZeRO state_dicts for rank 65
successfully loaded 8 ZeRO state_dicts for rank 78
successfully loaded 8 ZeRO state_dicts for rank 93
successfully loaded 8 ZeRO state_dicts for rank 188
WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-24 05:53:20 CEST)" was missed by 0:00:03.434951
successfully loaded 8 ZeRO state_dicts for rank 162
successfully loaded 8 ZeRO state_dicts for rank 63
successfully loaded 8 ZeRO state_dicts for rank 61
successfully loaded 8 ZeRO state_dicts for rank 221
successfully loaded 8 ZeRO state_dicts for rank 107
successfully loaded 8 ZeRO state_dicts for rank 179
successfully loaded 8 ZeRO state_dicts for rank 147
successfully loaded 8 ZeRO state_dicts for rank 36
loading 8 zero partition checkpoints for rank 132
successfully loaded 8 ZeRO state_dicts for rank 116
successfully loaded 8 ZeRO state_dicts for rank 199
loading 8 zero partition checkpoints for rank 88
loading 8 zero partition checkpoints for rank 170
successfully loaded 8 ZeRO state_dicts for rank 151
successfully loaded 8 ZeRO state_dicts for rank 76
successfully loaded 8 ZeRO state_dicts for rank 35
successfully loaded 8 ZeRO state_dicts for rank 223
successfully loaded 8 ZeRO state_dicts for rank 175
successfully loaded 8 ZeRO state_dicts for rank 13
successfully loaded 8 ZeRO state_dicts for rank 207
successfully loaded 8 ZeRO state_dicts for rank 218
successfully loaded 8 ZeRO state_dicts for rank 213
successfully loaded 8 ZeRO state_dicts for rank 119
successfully loaded 8 ZeRO state_dicts for rank 198
successfully loaded 8 ZeRO state_dicts for rank 164
loading 8 zero partition checkpoints for rank 159
successfully loaded 8 ZeRO state_dicts for rank 109
successfully loaded 8 ZeRO state_dicts for rank 197
successfully loaded 8 ZeRO state_dicts for rank 66
successfully loaded 8 ZeRO state_dicts for rank 22
successfully loaded 8 ZeRO state_dicts for rank 185
successfully loaded 8 ZeRO state_dicts for rank 196
successfully loaded 8 ZeRO state_dicts for rank 43
successfully loaded 8 ZeRO state_dicts for rank 204
successfully loaded 8 ZeRO state_dicts for rank 205
successfully loaded 8 ZeRO state_dicts for rank 181
successfully loaded 8 ZeRO state_dicts for rank 25
successfully loaded 8 ZeRO state_dicts for rank 91
successfully loaded 8 ZeRO state_dicts for rank 212
successfully loaded 8 ZeRO state_dicts for rank 173
successfully loaded 8 ZeRO state_dicts for rank 39
successfully loaded 8 ZeRO state_dicts for rank 161
successfully loaded 8 ZeRO state_dicts for rank 29
successfully loaded 8 ZeRO state_dicts for rank 26
successfully loaded 8 ZeRO state_dicts for rank 180
successfully loaded 8 ZeRO state_dicts for rank 28
successfully loaded 8 ZeRO state_dicts for rank 87
successfully loaded 8 ZeRO state_dicts for rank 53
successfully loaded 8 ZeRO state_dicts for rank 194
successfully loaded 8 ZeRO state_dicts for rank 54
successfully loaded 8 ZeRO state_dicts for rank 73
successfully loaded 8 ZeRO state_dicts for rank 21
successfully loaded 8 ZeRO state_dicts for rank 27
successfully loaded 8 ZeRO state_dicts for rank 46
successfully loaded 8 ZeRO state_dicts for rank 67
loading 8 zero partition checkpoints for rank 32
successfully loaded 8 ZeRO state_dicts for rank 184
successfully loaded 8 ZeRO state_dicts for rank 165
successfully loaded 8 ZeRO state_dicts for rank 118
successfully loaded 8 ZeRO state_dicts for rank 220
successfully loaded 8 ZeRO state_dicts for rank 57
successfully loaded 8 ZeRO state_dicts for rank 75
successfully loaded 8 ZeRO state_dicts for rank 0
successfully loaded 8 ZeRO state_dicts for rank 92
loading 8 zero partition checkpoints for rank 124
successfully loaded 8 ZeRO state_dicts for rank 94
successfully loaded 8 ZeRO state_dicts for rank 55
successfully loaded 8 ZeRO state_dicts for rank 72
successfully loaded 8 ZeRO state_dicts for rank 83
successfully loaded 8 ZeRO state_dicts for rank 6
successfully loaded 8 ZeRO state_dicts for rank 86
successfully loaded 8 ZeRO state_dicts for rank 189
successfully loaded 8 ZeRO state_dicts for rank 5
successfully loaded 8 ZeRO state_dicts for rank 117
successfully loaded 8 ZeRO state_dicts for rank 4
successfully loaded 8 ZeRO state_dicts for rank 30
successfully loaded 8 ZeRO state_dicts for rank 155
successfully loaded 8 ZeRO state_dicts for rank 1
successfully loaded 8 ZeRO state_dicts for rank 110
successfully loaded 8 ZeRO state_dicts for rank 58
successfully loaded 8 ZeRO state_dicts for rank 79
successfully loaded 8 ZeRO state_dicts for rank 101
successfully loaded 8 ZeRO state_dicts for rank 177
successfully loaded 8 ZeRO state_dicts for rank 2
loading 8 zero partition checkpoints for rank 167
successfully loaded 8 ZeRO state_dicts for rank 95
successfully loaded 8 ZeRO state_dicts for rank 227
loading 8 zero partition checkpoints for rank 171
successfully loaded 8 ZeRO state_dicts for rank 103
successfully loaded 8 ZeRO state_dicts for rank 142
loading 8 zero partition checkpoints for rank 96
successfully loaded 8 ZeRO state_dicts for rank 10
loading 8 zero partition checkpoints for rank 127
successfully loaded 8 ZeRO state_dicts for rank 31
successfully loaded 8 ZeRO state_dicts for rank 178
successfully loaded 8 ZeRO state_dicts for rank 3
successfully loaded 8 ZeRO state_dicts for rank 154
successfully loaded 8 ZeRO state_dicts for rank 47
successfully loaded 8 ZeRO state_dicts for rank 59
successfully loaded 8 ZeRO state_dicts for rank 23
successfully loaded 8 ZeRO state_dicts for rank 15
loading 8 zero partition checkpoints for rank 148
successfully loaded 8 ZeRO state_dicts for rank 182
successfully loaded 8 ZeRO state_dicts for rank 14
successfully loaded 8 ZeRO state_dicts for rank 252
successfully loaded 8 ZeRO state_dicts for rank 236
successfully loaded 8 ZeRO state_dicts for rank 224
successfully loaded 8 ZeRO state_dicts for rank 183
loading 8 zero partition checkpoints for rank 144
successfully loaded 8 ZeRO state_dicts for rank 138
loading 8 zero partition checkpoints for rank 99
successfully loaded 8 ZeRO state_dicts for rank 230
loading 8 zero partition checkpoints for rank 120
successfully loaded 8 ZeRO state_dicts for rank 238
loading 8 zero partition checkpoints for rank 156
successfully loaded 8 ZeRO state_dicts for rank 226
successfully loaded 8 ZeRO state_dicts for rank 8
successfully loaded 8 ZeRO state_dicts for rank 231
successfully loaded 8 ZeRO state_dicts for rank 243
successfully loaded 8 ZeRO state_dicts for rank 246
successfully loaded 8 ZeRO state_dicts for rank 150
successfully loaded 8 ZeRO state_dicts for rank 239
successfully loaded 8 ZeRO state_dicts for rank 250
loading 8 zero partition checkpoints for rank 104
successfully loaded 8 ZeRO state_dicts for rank 242
successfully loaded 8 ZeRO state_dicts for rank 234
loading 8 zero partition checkpoints for rank 140
successfully loaded 8 ZeRO state_dicts for rank 240
loading 8 zero partition checkpoints for rank 193
successfully loaded 8 ZeRO state_dicts for rank 254
loading 8 zero partition checkpoints for rank 169
successfully loaded 8 ZeRO state_dicts for rank 244
successfully loaded 8 ZeRO state_dicts for rank 9
loading 8 zero partition checkpoints for rank 112
successfully loaded 8 ZeRO state_dicts for rank 7
successfully loaded 8 ZeRO state_dicts for rank 241
loading 8 zero partition checkpoints for rank 69
successfully loaded 8 ZeRO state_dicts for rank 237
successfully loaded 8 ZeRO state_dicts for rank 174
loading 8 zero partition checkpoints for rank 201
successfully loaded 8 ZeRO state_dicts for rank 229
successfully loaded 8 ZeRO state_dicts for rank 248
successfully loaded 8 ZeRO state_dicts for rank 235
successfully loaded 8 ZeRO state_dicts for rank 253
loading 8 zero partition checkpoints for rank 209
loading 8 zero partition checkpoints for rank 40
loading 8 zero partition checkpoints for rank 60
successfully loaded 8 ZeRO state_dicts for rank 225
loading 8 zero partition checkpoints for rank 80
successfully loaded 8 ZeRO state_dicts for rank 232
successfully loaded 8 ZeRO state_dicts for rank 255
successfully loaded 8 ZeRO state_dicts for rank 247
loading 8 zero partition checkpoints for rank 90
loading 8 zero partition checkpoints for rank 143
successfully loaded 8 ZeRO state_dicts for rank 251
successfully loaded 8 ZeRO state_dicts for rank 233
loading 8 zero partition checkpoints for rank 125
loading 8 zero partition checkpoints for rank 34
loading 8 zero partition checkpoints for rank 106
successfully loaded 8 ZeRO state_dicts for rank 245
loading 8 zero partition checkpoints for rank 137
loading 8 zero partition checkpoints for rank 81
successfully loaded 8 ZeRO state_dicts for rank 102
loading 8 zero partition checkpoints for rank 187
loading 8 zero partition checkpoints for rank 215
successfully loaded 8 ZeRO state_dicts for rank 249
loading 8 zero partition checkpoints for rank 186
loading 8 zero partition checkpoints for rank 105
loading 8 zero partition checkpoints for rank 64
loading 8 zero partition checkpoints for rank 74
loading 8 zero partition checkpoints for rank 160
loading 8 zero partition checkpoints for rank 216
loading 8 zero partition checkpoints for rank 77
loading 8 zero partition checkpoints for rank 139
loading 8 zero partition checkpoints for rank 149
loading 8 zero partition checkpoints for rank 89
loading 8 zero partition checkpoints for rank 114
loading 8 zero partition checkpoints for rank 152
loading 8 zero partition checkpoints for rank 42
loading 8 zero partition checkpoints for rank 108
loading 8 zero partition checkpoints for rank 228
loading 8 zero partition checkpoints for rank 206
loading 8 zero partition checkpoints for rank 33
loading 8 zero partition checkpoints for rank 41
loading 8 zero partition checkpoints for rank 135
loading 8 zero partition checkpoints for rank 71
loading 8 zero partition checkpoints for rank 222
loading 8 zero partition checkpoints for rank 62
loading 8 zero partition checkpoints for rank 134
successfully loaded 8 ZeRO state_dicts for rank 11
loading 8 zero partition checkpoints for rank 129
loading 8 zero partition checkpoints for rank 126
loading 8 zero partition checkpoints for rank 192
loading 8 zero partition checkpoints for rank 153
loading 8 zero partition checkpoints for rank 202
loading 8 zero partition checkpoints for rank 128
loading 8 zero partition checkpoints for rank 84
loading 8 zero partition checkpoints for rank 141
loading 8 zero partition checkpoints for rank 45
loading 8 zero partition checkpoints for rank 115
loading 8 zero partition checkpoints for rank 56
loading 8 zero partition checkpoints for rank 111
loading 8 zero partition checkpoints for rank 121
loading 8 zero partition checkpoints for rank 130
loading 8 zero partition checkpoints for rank 20
loading 8 zero partition checkpoints for rank 133
loading 8 zero partition checkpoints for rank 38
loading 8 zero partition checkpoints for rank 122
loading 8 zero partition checkpoints for rank 97
loading 8 zero partition checkpoints for rank 158
loading 8 zero partition checkpoints for rank 85
loading 8 zero partition checkpoints for rank 157
loading 8 zero partition checkpoints for rank 78
loading 8 zero partition checkpoints for rank 162
loading 8 zero partition checkpoints for rank 191
loading 8 zero partition checkpoints for rank 65
loading 8 zero partition checkpoints for rank 44
loading 8 zero partition checkpoints for rank 82
loading 8 zero partition checkpoints for rank 98
loading 8 zero partition checkpoints for rank 63
loading 8 zero partition checkpoints for rank 12
loading 8 zero partition checkpoints for rank 113
loading 8 zero partition checkpoints for rank 188
loading 8 zero partition checkpoints for rank 151
loading 8 zero partition checkpoints for rank 146
loading 8 zero partition checkpoints for rank 36
loading 8 zero partition checkpoints for rank 123
loading 8 zero partition checkpoints for rank 210
loading 8 zero partition checkpoints for rank 37
loading 8 zero partition checkpoints for rank 119
loading 8 zero partition checkpoints for rank 197
loading 8 zero partition checkpoints for rank 223
loading 8 zero partition checkpoints for rank 52
loading 8 zero partition checkpoints for rank 179
loading 8 zero partition checkpoints for rank 76
loading 8 zero partition checkpoints for rank 218
loading 8 zero partition checkpoints for rank 219
loading 8 zero partition checkpoints for rank 35
loading 8 zero partition checkpoints for rank 107
loading 8 zero partition checkpoints for rank 163
loading 8 zero partition checkpoints for rank 43
loading 8 zero partition checkpoints for rank 212
loading 8 zero partition checkpoints for rank 49
loading 8 zero partition checkpoints for rank 208
loading 8 zero partition checkpoints for rank 181
loading 8 zero partition checkpoints for rank 91
loading 8 zero partition checkpoints for rank 185
loading 8 zero partition checkpoints for rank 214
loading 8 zero partition checkpoints for rank 53
loading 8 zero partition checkpoints for rank 75
loading 8 zero partition checkpoints for rank 46
loading 8 zero partition checkpoints for rank 165
loading 8 zero partition checkpoints for rank 57
loading 8 zero partition checkpoints for rank 211
loading 8 zero partition checkpoints for rank 180
loading 8 zero partition checkpoints for rank 55
loading 8 zero partition checkpoints for rank 217
loading 8 zero partition checkpoints for rank 92
loading 8 zero partition checkpoints for rank 61
loading 8 zero partition checkpoints for rank 110
loading 8 zero partition checkpoints for rank 196
loading 8 zero partition checkpoints for rank 205
loading 8 zero partition checkpoints for rank 83
loading 8 zero partition checkpoints for rank 25
loading 8 zero partition checkpoints for rank 68
loading 8 zero partition checkpoints for rank 195
loading 8 zero partition checkpoints for rank 118
loading 8 zero partition checkpoints for rank 79
loading 8 zero partition checkpoints for rank 155
loading 8 zero partition checkpoints for rank 184
loading 8 zero partition checkpoints for rank 94
loading 8 zero partition checkpoints for rank 39
loading 8 zero partition checkpoints for rank 27
loading 8 zero partition checkpoints for rank 21
loading 8 zero partition checkpoints for rank 58
loading 8 zero partition checkpoints for rank 103
loading 8 zero partition checkpoints for rank 100
loading 8 zero partition checkpoints for rank 101
loading 8 zero partition checkpoints for rank 154
loading 8 zero partition checkpoints for rank 131
loading 8 zero partition checkpoints for rank 145
loading 8 zero partition checkpoints for rank 0
loading 8 zero partition checkpoints for rank 136
 checkpoint version 3.0
loading 8 zero partition checkpoints for rank 48
loading 8 zero partition checkpoints for rank 51
loading 8 zero partition checkpoints for rank 29
loading 8 zero partition checkpoints for rank 109
loading 8 zero partition checkpoints for rank 213
loading 8 zero partition checkpoints for rank 93
loading 8 zero partition checkpoints for rank 183
loading 8 zero partition checkpoints for rank 72
loading 8 zero partition checkpoints for rank 59
loading 8 zero partition checkpoints for rank 200
loading 8 zero partition checkpoints for rank 73
loading 8 zero partition checkpoints for rank 142
loading 8 zero partition checkpoints for rank 182
loading 8 zero partition checkpoints for rank 70
loading 8 zero partition checkpoints for rank 161
loading 8 zero partition checkpoints for rank 150
loading 8 zero partition checkpoints for rank 5
loading 8 zero partition checkpoints for rank 203
loading 8 zero partition checkpoints for rank 194
loading 8 zero partition checkpoints for rank 190
loading 8 zero partition checkpoints for rank 6
loading 8 zero partition checkpoints for rank 54
loading 8 zero partition checkpoints for rank 47
loading 8 zero partition checkpoints for rank 221
loading 8 zero partition checkpoints for rank 4
loading 8 zero partition checkpoints for rank 138
loading 8 zero partition checkpoints for rank 50
loading 8 zero partition checkpoints for rank 3
loading 8 zero partition checkpoints for rank 177
loading 8 zero partition checkpoints for rank 30
loading 8 zero partition checkpoints for rank 15
loading 8 zero partition checkpoints for rank 166
loading 8 zero partition checkpoints for rank 226
loading 8 zero partition checkpoints for rank 238
loading 8 zero partition checkpoints for rank 207
loading 8 zero partition checkpoints for rank 22
loading 8 zero partition checkpoints for rank 147
loading 8 zero partition checkpoints for rank 87
loading 8 zero partition checkpoints for rank 178
loading 8 zero partition checkpoints for rank 172
loading 8 zero partition checkpoints for rank 204
loading 8 zero partition checkpoints for rank 66
loading 8 zero partition checkpoints for rank 250
loading 8 zero partition checkpoints for rank 220
loading 8 zero partition checkpoints for rank 254
loading 8 zero partition checkpoints for rank 95
loading 8 zero partition checkpoints for rank 239
loading 8 zero partition checkpoints for rank 24
loading 8 zero partition checkpoints for rank 86
loading 8 zero partition checkpoints for rank 189
loading 8 zero partition checkpoints for rank 229
loading 8 zero partition checkpoints for rank 241
loading 8 zero partition checkpoints for rank 240
loading 8 zero partition checkpoints for rank 253
loading 8 zero partition checkpoints for rank 199
loading 8 zero partition checkpoints for rank 67
loading 8 zero partition checkpoints for rank 175
loading 8 zero partition checkpoints for rank 225
loading 8 zero partition checkpoints for rank 164
loading 8 zero partition checkpoints for rank 246
loading 8 zero partition checkpoints for rank 236
loading 8 zero partition checkpoints for rank 198
loading 8 zero partition checkpoints for rank 247
loading 8 zero partition checkpoints for rank 233
loading 8 zero partition checkpoints for rank 116
loading 8 zero partition checkpoints for rank 7
loading 8 zero partition checkpoints for rank 248
loading 8 zero partition checkpoints for rank 232
loading 8 zero partition checkpoints for rank 230
loading 8 zero partition checkpoints for rank 173
loading 8 zero partition checkpoints for rank 231
loading 8 zero partition checkpoints for rank 244
loading 8 zero partition checkpoints for rank 117
loading 8 zero partition checkpoints for rank 102
loading 8 zero partition checkpoints for rank 26
loading 8 zero partition checkpoints for rank 23
loading 8 zero partition checkpoints for rank 245
loading 8 zero partition checkpoints for rank 237
loading 8 zero partition checkpoints for rank 227
loading 8 zero partition checkpoints for rank 28
loading 8 zero partition checkpoints for rank 252
loading 8 zero partition checkpoints for rank 13
loading 8 zero partition checkpoints for rank 1
loading 8 zero partition checkpoints for rank 174
loading 8 zero partition checkpoints for rank 242
loading 8 zero partition checkpoints for rank 224
loading 8 zero partition checkpoints for rank 2
loading 8 zero partition checkpoints for rank 31
loading 8 zero partition checkpoints for rank 243
loading 8 zero partition checkpoints for rank 14
loading 8 zero partition checkpoints for rank 234
loading 8 zero partition checkpoints for rank 255
loading 8 zero partition checkpoints for rank 235
loading 8 zero partition checkpoints for rank 251
loading 8 zero partition checkpoints for rank 10
loading 8 zero partition checkpoints for rank 249
loading 8 zero partition checkpoints for rank 9
loading 8 zero partition checkpoints for rank 8
loading 8 zero partition checkpoints for rank 11
successfully loaded 8 ZeRO state_dicts for rank 17
successfully loaded 8 ZeRO state_dicts for rank 19
successfully loaded 8 ZeRO state_dicts for rank 18
successfully loaded 8 ZeRO state_dicts for rank 16
loading 8 zero partition checkpoints for rank 17
loading 8 zero partition checkpoints for rank 19
loading 8 zero partition checkpoints for rank 18
loading 8 zero partition checkpoints for rank 16
  successfully loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints at iteration 942
time (ms) | load-checkpoint: 82978.97
[after model, optimizer, and learning rate scheduler are built] datetime: 2021-09-24 05:54:15 
> building train, validation, and test datasets ...
 > datasets target sizes (minimum size):
    train:      300000000
    validation: 1638400
    test:       10240
> building train, validation, and test datasets for GPT ...
 > building dataset index ...
    reading sizes...
    reading pointers...
    reading document index...
    creating numpy buffer of mmap...
    creating memory view of numpy buffer...
 > finished creating indexed dataset in 0.135933 seconds
    number of documents: 304230423
 > dataset split:
    train:
     document indices in [0, 288714672) total of 288714672 documents
    validation:
     document indices in [288714672, 303926193) total of 15211521 documents
    test:
     document indices in [303926193, 304230423) total of 304230 documents
 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_300000000ns_2048sl_42s_doc_idx.npy
 > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_300000000ns_2048sl_42s_sample_idx.npy
 > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_300000000ns_2048sl_42s_shuffle_idx.npy
    loaded indexed file in 0.348 seconds
    total number of samples: 394611670
    total number of epochs: 3
 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_1638400ns_2048sl_42s_doc_idx.npy
 > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_1638400ns_2048sl_42s_sample_idx.npy
 > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_1638400ns_2048sl_42s_shuffle_idx.npy
    loaded indexed file in 0.321 seconds
    total number of samples: 6927161
    total number of epochs: 1
 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_42s_doc_idx.npy
 > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_42s_sample_idx.npy
 > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_42s_shuffle_idx.npy
    loaded indexed file in 0.062 seconds
    total number of samples: 137384
    total number of epochs: 1
> finished creating GPT datasets ...
[after dataloaders are built] datetime: 2021-09-24 05:54:21 
done with setup ...
training ...
time (ms) | model-and-optimizer-setup: 91017.54 | train/valid/test-data-iterators-setup: 4740.91
[before the start of training step] datetime: 2021-09-24 05:54:21 
[2021-09-24 05:54:21,235] [INFO] [checkpointing.py:408:forward] Activation Checkpointing Information
[2021-09-24 05:54:21,235] [INFO] [checkpointing.py:409:forward] ----Partition Activations False, CPU CHECKPOINTING False
[2021-09-24 05:54:21,235] [INFO] [checkpointing.py:412:forward] ----contiguous Memory Checkpointing False with 32 total layers
[2021-09-24 05:54:21,235] [INFO] [checkpointing.py:415:forward] ----Synchronization False
[2021-09-24 05:54:21,235] [INFO] [checkpointing.py:416:forward] ----Profiling time in checkpointing False
[Rank 1] (after 943 iterations) memory (MB) | allocated: 6661.611328125 | max allocated: 11742.55810546875 | reserved: 22890.0 | max reserved: 22890.0
[Rank 225] (after 943 iterations) memory (MB) | allocated: 7107.70751953125 | max allocated: 11884.6845703125 | reserved: 22108.0 | max reserved: 22108.0
[Rank 65] (after 943 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0
[Rank 33] (after 943 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0
[Rank 97] (after 943 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0
[Rank 129] (after 943 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0
[Rank 193] (after 943 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18586.0 | max reserved: 18586.0
[Rank 161] (after 943 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0
[Rank 2] (after 943 iterations) memory (MB) | allocated: 6661.611328125 | max allocated: 11742.55810546875 | reserved: 21150.0 | max reserved: 21150.0
[Rank 34] (after 943 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0
[Rank 226] (after 943 iterations) memory (MB) | allocated: 7107.70751953125 | max allocated: 11884.6845703125 | reserved: 21700.0 | max reserved: 21700.0
[Rank 66] (after 943 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18586.0 | max reserved: 18586.0
[Rank 98] (after 943 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0
[Rank 162] (after 943 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0
[Rank 130] (after 943 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18458.0 | max reserved: 18458.0
[Rank 194] (after 943 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18826.0 | max reserved: 18826.0
[Rank 0] (after 943 iterations) memory (MB) | allocated: 6661.611328125 | max allocated: 11742.55810546875 | reserved: 23526.0 | max reserved: 23526.0
[Rank 32] (after 943 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 19012.0 | max reserved: 19012.0
[Rank 64] (after 943 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 19012.0 | max reserved: 19012.0
[Rank 224] (after 943 iterations) memory (MB) | allocated: 7107.70751953125 | max allocated: 11884.6845703125 | reserved: 22492.0 | max reserved: 22492.0
[Rank 96] (after 943 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18948.0 | max reserved: 18948.0
[Rank 128] (after 943 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 19012.0 | max reserved: 19012.0
[Rank 192] (after 943 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 19076.0 | max reserved: 19076.0
[Rank 160] (after 943 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 19012.0 | max reserved: 19012.0
[Rank 3] (after 943 iterations) memory (MB) | allocated: 6661.611328125 | max allocated: 11742.55810546875 | reserved: 21150.0 | max reserved: 21150.0
[Rank 35] (after 943 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18826.0 | max reserved: 18826.0
[Rank 227] (after 943 iterations) memory (MB) | allocated: 7107.70751953125 | max allocated: 11884.6845703125 | reserved: 22492.0 | max reserved: 22492.0
[Rank 67] (after 943 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18458.0 | max reserved: 18458.0
[Rank 99] (after 943 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18522.0 | max reserved: 18522.0
[Rank 163] (after 943 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0
[Rank 131] (after 943 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0
[Rank 195] (after 943 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18826.0 | max reserved: 18826.0
 iteration      943/  159576 | consumed samples:        15088 | elapsed time per iteration (ms): 29806.1 | learning rate: 4.185E-06 | global batch size:    16 | lm loss: 7.642442E+00 | loss scale: 8192.0 | grad norm: 53639.718 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      944/  159576 | consumed samples:        15104 | elapsed time per iteration (ms): 13012.2 | learning rate: 4.189E-06 | global batch size:    16 | lm loss: 7.638637E+00 | loss scale: 8192.0 | grad norm: 47002.321 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      945/  159576 | consumed samples:        15120 | elapsed time per iteration (ms): 13551.8 | learning rate: 4.194E-06 | global batch size:    16 | lm loss: 7.559312E+00 | loss scale: 8192.0 | grad norm: 43680.206 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      946/  159576 | consumed samples:        15136 | elapsed time per iteration (ms): 13672.0 | learning rate: 4.198E-06 | global batch size:    16 | lm loss: 7.372701E+00 | loss scale: 8192.0 | grad norm: 29642.562 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      947/  159576 | consumed samples:        15152 | elapsed time per iteration (ms): 13523.5 | learning rate: 4.203E-06 | global batch size:    16 | lm loss: 7.431667E+00 | loss scale: 8192.0 | grad norm: 71525.963 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      948/  159576 | consumed samples:        15168 | elapsed time per iteration (ms): 13571.1 | learning rate: 4.207E-06 | global batch size:    16 | lm loss: 7.622519E+00 | loss scale: 8192.0 | grad norm: 108314.372 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      949/  159576 | consumed samples:        15184 | elapsed time per iteration (ms): 13513.7 | learning rate: 4.212E-06 | global batch size:    16 | lm loss: 7.491040E+00 | loss scale: 8192.0 | grad norm: 83775.616 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      950/  159576 | consumed samples:        15200 | elapsed time per iteration (ms): 13857.2 | learning rate: 4.216E-06 | global batch size:    16 | lm loss: 7.689845E+00 | loss scale: 8192.0 | grad norm: 42694.796 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      951/  159576 | consumed samples:        15216 | elapsed time per iteration (ms): 13556.0 | learning rate: 4.220E-06 | global batch size:    16 | lm loss: 7.541234E+00 | loss scale: 8192.0 | grad norm: 36744.623 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      952/  159576 | consumed samples:        15232 | elapsed time per iteration (ms): 13565.0 | learning rate: 4.225E-06 | global batch size:    16 | lm loss: 7.402619E+00 | loss scale: 8192.0 | grad norm: 37335.008 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      953/  159576 | consumed samples:        15248 | elapsed time per iteration (ms): 13600.8 | learning rate: 4.229E-06 | global batch size:    16 | lm loss: 7.524664E+00 | loss scale: 8192.0 | grad norm: 36490.188 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      954/  159576 | consumed samples:        15264 | elapsed time per iteration (ms): 13538.1 | learning rate: 4.234E-06 | global batch size:    16 | lm loss: 6.926525E+00 | loss scale: 8192.0 | grad norm: 28573.010 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      955/  159576 | consumed samples:        15280 | elapsed time per iteration (ms): 13767.3 | learning rate: 4.238E-06 | global batch size:    16 | lm loss: 7.564863E+00 | loss scale: 8192.0 | grad norm: 45556.471 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      956/  159576 | consumed samples:        15296 | elapsed time per iteration (ms): 13529.6 | learning rate: 4.243E-06 | global batch size:    16 | lm loss: 7.518897E+00 | loss scale: 8192.0 | grad norm: 40483.089 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      957/  159576 | consumed samples:        15312 | elapsed time per iteration (ms): 13548.2 | learning rate: 4.247E-06 | global batch size:    16 | lm loss: 7.292015E+00 | loss scale: 8192.0 | grad norm: 27123.950 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      958/  159576 | consumed samples:        15328 | elapsed time per iteration (ms): 13592.2 | learning rate: 4.251E-06 | global batch size:    16 | lm loss: 7.645267E+00 | loss scale: 8192.0 | grad norm: 45895.591 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      959/  159576 | consumed samples:        15344 | elapsed time per iteration (ms): 13834.7 | learning rate: 4.256E-06 | global batch size:    16 | lm loss: 7.439256E+00 | loss scale: 8192.0 | grad norm: 47827.958 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      960/  159576 | consumed samples:        15360 | elapsed time per iteration (ms): 13548.7 | learning rate: 4.260E-06 | global batch size:    16 | lm loss: 7.398325E+00 | loss scale: 8192.0 | grad norm: 41514.249 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      961/  159576 | consumed samples:        15376 | elapsed time per iteration (ms): 13540.1 | learning rate: 4.265E-06 | global batch size:    16 | lm loss: 7.498395E+00 | loss scale: 8192.0 | grad norm: 24323.912 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      962/  159576 | consumed samples:        15392 | elapsed time per iteration (ms): 13596.3 | learning rate: 4.269E-06 | global batch size:    16 | lm loss: 7.458749E+00 | loss scale: 8192.0 | grad norm: 37806.541 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      963/  159576 | consumed samples:        15408 | elapsed time per iteration (ms): 13925.1 | learning rate: 4.274E-06 | global batch size:    16 | lm loss: 7.414832E+00 | loss scale: 8192.0 | grad norm: 38291.446 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      964/  159576 | consumed samples:        15424 | elapsed time per iteration (ms): 13505.9 | learning rate: 4.278E-06 | global batch size:    16 | lm loss: 7.552760E+00 | loss scale: 8192.0 | grad norm: 23290.618 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      965/  159576 | consumed samples:        15440 | elapsed time per iteration (ms): 13598.7 | learning rate: 4.283E-06 | global batch size:    16 | lm loss: 7.566991E+00 | loss scale: 8192.0 | grad norm: 33429.496 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      966/  159576 | consumed samples:        15456 | elapsed time per iteration (ms): 13495.5 | learning rate: 4.287E-06 | global batch size:    16 | lm loss: 7.727429E+00 | loss scale: 8192.0 | grad norm: 33196.940 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      967/  159576 | consumed samples:        15472 | elapsed time per iteration (ms): 13508.3 | learning rate: 4.291E-06 | global batch size:    16 | lm loss: 7.517751E+00 | loss scale: 8192.0 | grad norm: 25674.592 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      968/  159576 | consumed samples:        15488 | elapsed time per iteration (ms): 13747.8 | learning rate: 4.296E-06 | global batch size:    16 | lm loss: 7.534285E+00 | loss scale: 8192.0 | grad norm: 28899.517 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      969/  159576 | consumed samples:        15504 | elapsed time per iteration (ms): 13541.9 | learning rate: 4.300E-06 | global batch size:    16 | lm loss: 7.412315E+00 | loss scale: 8192.0 | grad norm: 23856.723 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      970/  159576 | consumed samples:        15520 | elapsed time per iteration (ms): 13581.6 | learning rate: 4.305E-06 | global batch size:    16 | lm loss: 7.574214E+00 | loss scale: 8192.0 | grad norm: 26912.399 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      971/  159576 | consumed samples:        15536 | elapsed time per iteration (ms): 13575.2 | learning rate: 4.309E-06 | global batch size:    16 | lm loss: 7.489717E+00 | loss scale: 8192.0 | grad norm: 25683.773 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      972/  159576 | consumed samples:        15552 | elapsed time per iteration (ms): 14047.8 | learning rate: 4.314E-06 | global batch size:    16 | lm loss: 7.479139E+00 | loss scale: 8192.0 | grad norm: 23963.457 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      973/  159576 | consumed samples:        15568 | elapsed time per iteration (ms): 13519.1 | learning rate: 4.318E-06 | global batch size:    16 | lm loss: 7.557629E+00 | loss scale: 8192.0 | grad norm: 28281.687 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      974/  159576 | consumed samples:        15584 | elapsed time per iteration (ms): 13508.3 | learning rate: 4.322E-06 | global batch size:    16 | lm loss: 7.324095E+00 | loss scale: 8192.0 | grad norm: 24628.133 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      975/  159576 | consumed samples:        15600 | elapsed time per iteration (ms): 13557.4 | learning rate: 4.327E-06 | global batch size:    16 | lm loss: 7.551218E+00 | loss scale: 8192.0 | grad norm: 22604.906 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      976/  159576 | consumed samples:        15616 | elapsed time per iteration (ms): 13573.2 | learning rate: 4.331E-06 | global batch size:    16 | lm loss: 7.421384E+00 | loss scale: 8192.0 | grad norm: 25754.693 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      977/  159576 | consumed samples:        15632 | elapsed time per iteration (ms): 13891.1 | learning rate: 4.336E-06 | global batch size:    16 | lm loss: 7.421275E+00 | loss scale: 8192.0 | grad norm: 23427.022 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      978/  159576 | consumed samples:        15648 | elapsed time per iteration (ms): 13578.3 | learning rate: 4.340E-06 | global batch size:    16 | lm loss: 7.468715E+00 | loss scale: 8192.0 | grad norm: 25697.467 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      979/  159576 | consumed samples:        15664 | elapsed time per iteration (ms): 13602.5 | learning rate: 4.345E-06 | global batch size:    16 | lm loss: 7.679566E+00 | loss scale: 8192.0 | grad norm: 25403.982 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      980/  159576 | consumed samples:        15680 | elapsed time per iteration (ms): 13628.8 | learning rate: 4.349E-06 | global batch size:    16 | lm loss: 7.442289E+00 | loss scale: 8192.0 | grad norm: 30230.032 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      981/  159576 | consumed samples:        15696 | elapsed time per iteration (ms): 13812.5 | learning rate: 4.354E-06 | global batch size:    16 | lm loss: 7.521616E+00 | loss scale: 8192.0 | grad norm: 29030.478 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      982/  159576 | consumed samples:        15712 | elapsed time per iteration (ms): 13617.0 | learning rate: 4.358E-06 | global batch size:    16 | lm loss: 7.595479E+00 | loss scale: 8192.0 | grad norm: 32518.623 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-24 06:03:44] PULSE: tr8-104B is waiting for the previous Job Array job to finish before scheduling a new one (1162855_[2-10%1] on 'gpu_p13' partition)
[2021-09-24 06:03:44] PULSE: tr8-104B is running for 11:33 since 2021-09-24T05:52:11 (1162855_1 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8])
 iteration      983/  159576 | consumed samples:        15728 | elapsed time per iteration (ms): 13560.9 | learning rate: 4.362E-06 | global batch size:    16 | lm loss: 7.437976E+00 | loss scale: 8192.0 | grad norm: 25658.380 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      984/  159576 | consumed samples:        15744 | elapsed time per iteration (ms): 13555.5 | learning rate: 4.367E-06 | global batch size:    16 | lm loss: 7.561976E+00 | loss scale: 8192.0 | grad norm: 28146.514 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      985/  159576 | consumed samples:        15760 | elapsed time per iteration (ms): 13993.9 | learning rate: 4.371E-06 | global batch size:    16 | lm loss: 7.526425E+00 | loss scale: 8192.0 | grad norm: 22789.409 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      986/  159576 | consumed samples:        15776 | elapsed time per iteration (ms): 13819.4 | learning rate: 4.376E-06 | global batch size:    16 | lm loss: 7.568769E+00 | loss scale: 8192.0 | grad norm: 29742.595 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      987/  159576 | consumed samples:        15792 | elapsed time per iteration (ms): 13655.7 | learning rate: 4.380E-06 | global batch size:    16 | lm loss: 7.516987E+00 | loss scale: 8192.0 | grad norm: 29352.083 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      988/  159576 | consumed samples:        15808 | elapsed time per iteration (ms): 13528.1 | learning rate: 4.385E-06 | global batch size:    16 | lm loss: 7.482485E+00 | loss scale: 8192.0 | grad norm: 23020.708 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      989/  159576 | consumed samples:        15824 | elapsed time per iteration (ms): 13534.2 | learning rate: 4.389E-06 | global batch size:    16 | lm loss: 7.601320E+00 | loss scale: 8192.0 | grad norm: 23202.245 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      990/  159576 | consumed samples:        15840 | elapsed time per iteration (ms): 13617.6 | learning rate: 4.393E-06 | global batch size:    16 | lm loss: 7.522967E+00 | loss scale: 8192.0 | grad norm: 26298.479 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      991/  159576 | consumed samples:        15856 | elapsed time per iteration (ms): 13569.7 | learning rate: 4.398E-06 | global batch size:    16 | lm loss: 7.564295E+00 | loss scale: 8192.0 | grad norm: 30127.017 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      992/  159576 | consumed samples:        15872 | elapsed time per iteration (ms): 13596.4 | learning rate: 4.402E-06 | global batch size:    16 | lm loss: 7.530395E+00 | loss scale: 8192.0 | grad norm: 25061.967 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      993/  159576 | consumed samples:        15888 | elapsed time per iteration (ms): 13641.4 | learning rate: 4.407E-06 | global batch size:    16 | lm loss: 7.547958E+00 | loss scale: 8192.0 | grad norm: 24314.301 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      994/  159576 | consumed samples:        15904 | elapsed time per iteration (ms): 13912.4 | learning rate: 4.411E-06 | global batch size:    16 | lm loss: 7.429228E+00 | loss scale: 8192.0 | grad norm: 28339.027 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      995/  159576 | consumed samples:        15920 | elapsed time per iteration (ms): 13541.6 | learning rate: 4.416E-06 | global batch size:    16 | lm loss: 7.511089E+00 | loss scale: 8192.0 | grad norm: 27156.396 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      996/  159576 | consumed samples:        15936 | elapsed time per iteration (ms): 13577.4 | learning rate: 4.420E-06 | global batch size:    16 | lm loss: 7.332575E+00 | loss scale: 8192.0 | grad norm: 26750.652 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      997/  159576 | consumed samples:        15952 | elapsed time per iteration (ms): 13524.5 | learning rate: 4.425E-06 | global batch size:    16 | lm loss: 7.478838E+00 | loss scale: 8192.0 | grad norm: 30934.480 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      998/  159576 | consumed samples:        15968 | elapsed time per iteration (ms): 13570.2 | learning rate: 4.429E-06 | global batch size:    16 | lm loss: 7.363966E+00 | loss scale: 8192.0 | grad norm: 26717.522 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      999/  159576 | consumed samples:        15984 | elapsed time per iteration (ms): 13808.8 | learning rate: 4.433E-06 | global batch size:    16 | lm loss: 7.504936E+00 | loss scale: 8192.0 | grad norm: 33504.939 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1000/  159576 | consumed samples:        16000 | elapsed time per iteration (ms): 13740.5 | learning rate: 4.438E-06 | global batch size:    16 | lm loss: 7.441235E+00 | loss scale: 16384.0 | grad norm: 39922.218 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
------------------------------------------------------------------------------------------------
 validation loss at iteration 1000 | lm loss value: 7.422922E+00 | lm loss PPL: 1.673917E+03 | 
------------------------------------------------------------------------------------------------
 iteration     1001/  159576 | consumed samples:        16016 | elapsed time per iteration (ms): 18607.4 | learning rate: 4.442E-06 | global batch size:    16 | lm loss: 7.375732E+00 | loss scale: 16384.0 | grad norm: 55247.055 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1002/  159576 | consumed samples:        16032 | elapsed time per iteration (ms): 13593.5 | learning rate: 4.447E-06 | global batch size:    16 | lm loss: 7.377642E+00 | loss scale: 16384.0 | grad norm: 69178.499 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1003/  159576 | consumed samples:        16048 | elapsed time per iteration (ms): 13772.4 | learning rate: 4.451E-06 | global batch size:    16 | lm loss: 7.399412E+00 | loss scale: 16384.0 | grad norm: 56841.570 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1004/  159576 | consumed samples:        16064 | elapsed time per iteration (ms): 13547.9 | learning rate: 4.456E-06 | global batch size:    16 | lm loss: 7.476449E+00 | loss scale: 16384.0 | grad norm: 53109.525 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1005/  159576 | consumed samples:        16080 | elapsed time per iteration (ms): 13546.4 | learning rate: 4.460E-06 | global batch size:    16 | lm loss: 7.394112E+00 | loss scale: 16384.0 | grad norm: 62368.875 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1006/  159576 | consumed samples:        16096 | elapsed time per iteration (ms): 13685.8 | learning rate: 4.464E-06 | global batch size:    16 | lm loss: 7.426886E+00 | loss scale: 16384.0 | grad norm: 57003.932 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1007/  159576 | consumed samples:        16112 | elapsed time per iteration (ms): 14078.3 | learning rate: 4.469E-06 | global batch size:    16 | lm loss: 7.601004E+00 | loss scale: 16384.0 | grad norm: 62664.778 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1008/  159576 | consumed samples:        16128 | elapsed time per iteration (ms): 13787.6 | learning rate: 4.473E-06 | global batch size:    16 | lm loss: 7.774883E+00 | loss scale: 16384.0 | grad norm: 97296.354 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1009/  159576 | consumed samples:        16144 | elapsed time per iteration (ms): 13687.7 | learning rate: 4.478E-06 | global batch size:    16 | lm loss: 7.604346E+00 | loss scale: 16384.0 | grad norm: 65941.448 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1010/  159576 | consumed samples:        16160 | elapsed time per iteration (ms): 13703.4 | learning rate: 4.482E-06 | global batch size:    16 | lm loss: 7.360181E+00 | loss scale: 16384.0 | grad norm: 64245.298 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1011/  159576 | consumed samples:        16176 | elapsed time per iteration (ms): 14077.4 | learning rate: 4.487E-06 | global batch size:    16 | lm loss: 7.590093E+00 | loss scale: 16384.0 | grad norm: 66963.039 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1012/  159576 | consumed samples:        16192 | elapsed time per iteration (ms): 13697.2 | learning rate: 4.491E-06 | global batch size:    16 | lm loss: 7.648331E+00 | loss scale: 16384.0 | grad norm: 62407.028 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1013/  159576 | consumed samples:        16208 | elapsed time per iteration (ms): 13676.8 | learning rate: 4.496E-06 | global batch size:    16 | lm loss: 7.462048E+00 | loss scale: 16384.0 | grad norm: 76557.598 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1014/  159576 | consumed samples:        16224 | elapsed time per iteration (ms): 13713.9 | learning rate: 4.500E-06 | global batch size:    16 | lm loss: 7.345057E+00 | loss scale: 16384.0 | grad norm: 58991.980 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1015/  159576 | consumed samples:        16240 | elapsed time per iteration (ms): 13740.6 | learning rate: 4.504E-06 | global batch size:    16 | lm loss: 7.369339E+00 | loss scale: 16384.0 | grad norm: 76798.488 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1016/  159576 | consumed samples:        16256 | elapsed time per iteration (ms): 13921.9 | learning rate: 4.509E-06 | global batch size:    16 | lm loss: 7.564117E+00 | loss scale: 16384.0 | grad norm: 64166.866 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1017/  159576 | consumed samples:        16272 | elapsed time per iteration (ms): 13632.9 | learning rate: 4.513E-06 | global batch size:    16 | lm loss: 7.610378E+00 | loss scale: 16384.0 | grad norm: 65353.003 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1018/  159576 | consumed samples:        16288 | elapsed time per iteration (ms): 13686.4 | learning rate: 4.518E-06 | global batch size:    16 | lm loss: 7.676594E+00 | loss scale: 16384.0 | grad norm: 64547.303 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1019/  159576 | consumed samples:        16304 | elapsed time per iteration (ms): 13717.6 | learning rate: 4.522E-06 | global batch size:    16 | lm loss: 7.406422E+00 | loss scale: 16384.0 | grad norm: 63594.322 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1020/  159576 | consumed samples:        16320 | elapsed time per iteration (ms): 13939.6 | learning rate: 4.527E-06 | global batch size:    16 | lm loss: 7.459125E+00 | loss scale: 16384.0 | grad norm: 59823.821 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1021/  159576 | consumed samples:        16336 | elapsed time per iteration (ms): 13792.3 | learning rate: 4.531E-06 | global batch size:    16 | lm loss: 7.471806E+00 | loss scale: 16384.0 | grad norm: 56872.925 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1022/  159576 | consumed samples:        16352 | elapsed time per iteration (ms): 13687.8 | learning rate: 4.536E-06 | global batch size:    16 | lm loss: 7.110139E+00 | loss scale: 16384.0 | grad norm: 58937.652 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1023/  159576 | consumed samples:        16368 | elapsed time per iteration (ms): 13711.6 | learning rate: 4.540E-06 | global batch size:    16 | lm loss: 7.428498E+00 | loss scale: 16384.0 | grad norm: 57885.296 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1024/  159576 | consumed samples:        16384 | elapsed time per iteration (ms): 14207.9 | learning rate: 4.544E-06 | global batch size:    16 | lm loss: 7.374810E+00 | loss scale: 16384.0 | grad norm: 56855.393 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1025/  159576 | consumed samples:        16400 | elapsed time per iteration (ms): 13557.2 | learning rate: 4.549E-06 | global batch size:    16 | lm loss: 7.597025E+00 | loss scale: 16384.0 | grad norm: 57119.291 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1026/  159576 | consumed samples:        16416 | elapsed time per iteration (ms): 13700.8 | learning rate: 4.553E-06 | global batch size:    16 | lm loss: 7.473170E+00 | loss scale: 16384.0 | grad norm: 61762.366 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1027/  159576 | consumed samples:        16432 | elapsed time per iteration (ms): 13696.5 | learning rate: 4.558E-06 | global batch size:    16 | lm loss: 7.410631E+00 | loss scale: 16384.0 | grad norm: 63393.977 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1028/  159576 | consumed samples:        16448 | elapsed time per iteration (ms): 13664.5 | learning rate: 4.562E-06 | global batch size:    16 | lm loss: 7.475993E+00 | loss scale: 16384.0 | grad norm: 61819.228 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1029/  159576 | consumed samples:        16464 | elapsed time per iteration (ms): 13836.3 | learning rate: 4.567E-06 | global batch size:    16 | lm loss: 7.464800E+00 | loss scale: 16384.0 | grad norm: 52336.012 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1030/  159576 | consumed samples:        16480 | elapsed time per iteration (ms): 13692.5 | learning rate: 4.571E-06 | global batch size:    16 | lm loss: 7.449406E+00 | loss scale: 16384.0 | grad norm: 66491.596 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1031/  159576 | consumed samples:        16496 | elapsed time per iteration (ms): 13635.2 | learning rate: 4.575E-06 | global batch size:    16 | lm loss: 7.519850E+00 | loss scale: 16384.0 | grad norm: 65780.303 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1032/  159576 | consumed samples:        16512 | elapsed time per iteration (ms): 13708.9 | learning rate: 4.580E-06 | global batch size:    16 | lm loss: 7.513804E+00 | loss scale: 16384.0 | grad norm: 62434.258 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1033/  159576 | consumed samples:        16528 | elapsed time per iteration (ms): 13952.8 | learning rate: 4.584E-06 | global batch size:    16 | lm loss: 7.405169E+00 | loss scale: 16384.0 | grad norm: 74264.401 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1034/  159576 | consumed samples:        16544 | elapsed time per iteration (ms): 13788.4 | learning rate: 4.589E-06 | global batch size:    16 | lm loss: 7.367761E+00 | loss scale: 16384.0 | grad norm: 75791.477 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1035/  159576 | consumed samples:        16560 | elapsed time per iteration (ms): 13716.5 | learning rate: 4.593E-06 | global batch size:    16 | lm loss: 7.513783E+00 | loss scale: 16384.0 | grad norm: 91765.458 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1036/  159576 | consumed samples:        16576 | elapsed time per iteration (ms): 13658.1 | learning rate: 4.598E-06 | global batch size:    16 | lm loss: 7.556536E+00 | loss scale: 16384.0 | grad norm: 76354.552 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1037/  159576 | consumed samples:        16592 | elapsed time per iteration (ms): 13995.5 | learning rate: 4.602E-06 | global batch size:    16 | lm loss: 7.423755E+00 | loss scale: 16384.0 | grad norm: 70528.206 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1038/  159576 | consumed samples:        16608 | elapsed time per iteration (ms): 13797.2 | learning rate: 4.607E-06 | global batch size:    16 | lm loss: 7.452043E+00 | loss scale: 16384.0 | grad norm: 63200.280 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1039/  159576 | consumed samples:        16624 | elapsed time per iteration (ms): 13728.6 | learning rate: 4.611E-06 | global batch size:    16 | lm loss: 7.310857E+00 | loss scale: 16384.0 | grad norm: 135045.434 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1040/  159576 | consumed samples:        16640 | elapsed time per iteration (ms): 13690.2 | learning rate: 4.615E-06 | global batch size:    16 | lm loss: 7.374257E+00 | loss scale: 16384.0 | grad norm: 69159.214 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1041/  159576 | consumed samples:        16656 | elapsed time per iteration (ms): 13682.9 | learning rate: 4.620E-06 | global batch size:    16 | lm loss: 7.498551E+00 | loss scale: 16384.0 | grad norm: 67982.272 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1042/  159576 | consumed samples:        16672 | elapsed time per iteration (ms): 13991.8 | learning rate: 4.624E-06 | global batch size:    16 | lm loss: 7.373695E+00 | loss scale: 16384.0 | grad norm: 75175.434 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1043/  159576 | consumed samples:        16688 | elapsed time per iteration (ms): 13721.4 | learning rate: 4.629E-06 | global batch size:    16 | lm loss: 7.642927E+00 | loss scale: 16384.0 | grad norm: 103318.209 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1044/  159576 | consumed samples:        16704 | elapsed time per iteration (ms): 13718.3 | learning rate: 4.633E-06 | global batch size:    16 | lm loss: 7.423826E+00 | loss scale: 16384.0 | grad norm: 71060.972 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1045/  159576 | consumed samples:        16720 | elapsed time per iteration (ms): 13604.4 | learning rate: 4.638E-06 | global batch size:    16 | lm loss: 7.362212E+00 | loss scale: 16384.0 | grad norm: 81169.902 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1046/  159576 | consumed samples:        16736 | elapsed time per iteration (ms): 14075.1 | learning rate: 4.642E-06 | global batch size:    16 | lm loss: 7.450203E+00 | loss scale: 16384.0 | grad norm: 83510.606 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1047/  159576 | consumed samples:        16752 | elapsed time per iteration (ms): 13677.3 | learning rate: 4.646E-06 | global batch size:    16 | lm loss: 7.554290E+00 | loss scale: 16384.0 | grad norm: 81988.459 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1048/  159576 | consumed samples:        16768 | elapsed time per iteration (ms): 13606.4 | learning rate: 4.651E-06 | global batch size:    16 | lm loss: 7.327914E+00 | loss scale: 16384.0 | grad norm: 71618.221 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1049/  159576 | consumed samples:        16784 | elapsed time per iteration (ms): 13669.1 | learning rate: 4.655E-06 | global batch size:    16 | lm loss: 7.596028E+00 | loss scale: 16384.0 | grad norm: 76665.796 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1050/  159576 | consumed samples:        16800 | elapsed time per iteration (ms): 13708.7 | learning rate: 4.660E-06 | global batch size:    16 | lm loss: 7.326102E+00 | loss scale: 16384.0 | grad norm: 83331.339 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1051/  159576 | consumed samples:        16816 | elapsed time per iteration (ms): 13981.1 | learning rate: 4.664E-06 | global batch size:    16 | lm loss: 7.619492E+00 | loss scale: 16384.0 | grad norm: 82397.650 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1052/  159576 | consumed samples:        16832 | elapsed time per iteration (ms): 13516.4 | learning rate: 4.669E-06 | global batch size:    16 | lm loss: 7.530663E+00 | loss scale: 16384.0 | grad norm: 56319.745 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1053/  159576 | consumed samples:        16848 | elapsed time per iteration (ms): 13647.6 | learning rate: 4.673E-06 | global batch size:    16 | lm loss: 7.443875E+00 | loss scale: 16384.0 | grad norm: 72562.436 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1054/  159576 | consumed samples:        16864 | elapsed time per iteration (ms): 13627.5 | learning rate: 4.678E-06 | global batch size:    16 | lm loss: 7.479875E+00 | loss scale: 16384.0 | grad norm: 61495.093 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1055/  159576 | consumed samples:        16880 | elapsed time per iteration (ms): 14065.0 | learning rate: 4.682E-06 | global batch size:    16 | lm loss: 7.612121E+00 | loss scale: 16384.0 | grad norm: 112310.814 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1056/  159576 | consumed samples:        16896 | elapsed time per iteration (ms): 13707.4 | learning rate: 4.686E-06 | global batch size:    16 | lm loss: 7.408166E+00 | loss scale: 16384.0 | grad norm: 92018.659 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1057/  159576 | consumed samples:        16912 | elapsed time per iteration (ms): 13656.1 | learning rate: 4.691E-06 | global batch size:    16 | lm loss: 7.422934E+00 | loss scale: 16384.0 | grad norm: 67279.309 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1058/  159576 | consumed samples:        16928 | elapsed time per iteration (ms): 13676.8 | learning rate: 4.695E-06 | global batch size:    16 | lm loss: 7.397638E+00 | loss scale: 16384.0 | grad norm: 87601.196 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1059/  159576 | consumed samples:        16944 | elapsed time per iteration (ms): 14053.0 | learning rate: 4.700E-06 | global batch size:    16 | lm loss: 7.514566E+00 | loss scale: 16384.0 | grad norm: 115639.831 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1060/  159576 | consumed samples:        16960 | elapsed time per iteration (ms): 13722.6 | learning rate: 4.704E-06 | global batch size:    16 | lm loss: 7.310302E+00 | loss scale: 16384.0 | grad norm: 142865.091 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1061/  159576 | consumed samples:        16976 | elapsed time per iteration (ms): 13679.9 | learning rate: 4.709E-06 | global batch size:    16 | lm loss: 7.399222E+00 | loss scale: 16384.0 | grad norm: 100646.221 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1062/  159576 | consumed samples:        16992 | elapsed time per iteration (ms): 13634.5 | learning rate: 4.713E-06 | global batch size:    16 | lm loss: 7.332808E+00 | loss scale: 16384.0 | grad norm: 66218.286 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1063/  159576 | consumed samples:        17008 | elapsed time per iteration (ms): 13663.6 | learning rate: 4.717E-06 | global batch size:    16 | lm loss: 7.490856E+00 | loss scale: 16384.0 | grad norm: 127442.068 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1064/  159576 | consumed samples:        17024 | elapsed time per iteration (ms): 13909.0 | learning rate: 4.722E-06 | global batch size:    16 | lm loss: 7.693977E+00 | loss scale: 16384.0 | grad norm: 101533.485 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1065/  159576 | consumed samples:        17040 | elapsed time per iteration (ms): 13658.8 | learning rate: 4.726E-06 | global batch size:    16 | lm loss: 7.565272E+00 | loss scale: 16384.0 | grad norm: 87035.171 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1066/  159576 | consumed samples:        17056 | elapsed time per iteration (ms): 13679.2 | learning rate: 4.731E-06 | global batch size:    16 | lm loss: 7.790638E+00 | loss scale: 16384.0 | grad norm: 86411.886 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1067/  159576 | consumed samples:        17072 | elapsed time per iteration (ms): 13759.2 | learning rate: 4.735E-06 | global batch size:    16 | lm loss: 7.438931E+00 | loss scale: 16384.0 | grad norm: 65756.645 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1068/  159576 | consumed samples:        17088 | elapsed time per iteration (ms): 14138.1 | learning rate: 4.740E-06 | global batch size:    16 | lm loss: 7.361547E+00 | loss scale: 16384.0 | grad norm: 130711.456 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1069/  159576 | consumed samples:        17104 | elapsed time per iteration (ms): 13687.8 | learning rate: 4.744E-06 | global batch size:    16 | lm loss: 7.413251E+00 | loss scale: 16384.0 | grad norm: 58324.579 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1070/  159576 | consumed samples:        17120 | elapsed time per iteration (ms): 13637.9 | learning rate: 4.749E-06 | global batch size:    16 | lm loss: 7.397507E+00 | loss scale: 16384.0 | grad norm: 89260.600 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1071/  159576 | consumed samples:        17136 | elapsed time per iteration (ms): 13680.2 | learning rate: 4.753E-06 | global batch size:    16 | lm loss: 7.535676E+00 | loss scale: 16384.0 | grad norm: 74408.995 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1072/  159576 | consumed samples:        17152 | elapsed time per iteration (ms): 14062.2 | learning rate: 4.757E-06 | global batch size:    16 | lm loss: 7.411667E+00 | loss scale: 16384.0 | grad norm: 77225.681 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1073/  159576 | consumed samples:        17168 | elapsed time per iteration (ms): 13681.2 | learning rate: 4.762E-06 | global batch size:    16 | lm loss: 7.394706E+00 | loss scale: 16384.0 | grad norm: 78590.421 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1074/  159576 | consumed samples:        17184 | elapsed time per iteration (ms): 13709.1 | learning rate: 4.766E-06 | global batch size:    16 | lm loss: 7.616404E+00 | loss scale: 16384.0 | grad norm: 82722.799 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1075/  159576 | consumed samples:        17200 | elapsed time per iteration (ms): 13743.2 | learning rate: 4.771E-06 | global batch size:    16 | lm loss: 7.395072E+00 | loss scale: 16384.0 | grad norm: 63549.807 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1076/  159576 | consumed samples:        17216 | elapsed time per iteration (ms): 13619.1 | learning rate: 4.775E-06 | global batch size:    16 | lm loss: 7.593513E+00 | loss scale: 16384.0 | grad norm: 100985.259 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1077/  159576 | consumed samples:        17232 | elapsed time per iteration (ms): 13859.6 | learning rate: 4.780E-06 | global batch size:    16 | lm loss: 7.379070E+00 | loss scale: 16384.0 | grad norm: 56935.671 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1078/  159576 | consumed samples:        17248 | elapsed time per iteration (ms): 13589.7 | learning rate: 4.784E-06 | global batch size:    16 | lm loss: 7.412032E+00 | loss scale: 16384.0 | grad norm: 93391.483 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1079/  159576 | consumed samples:        17264 | elapsed time per iteration (ms): 13575.0 | learning rate: 4.788E-06 | global batch size:    16 | lm loss: 7.485137E+00 | loss scale: 16384.0 | grad norm: 70759.989 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1080/  159576 | consumed samples:        17280 | elapsed time per iteration (ms): 13590.9 | learning rate: 4.793E-06 | global batch size:    16 | lm loss: 7.410018E+00 | loss scale: 16384.0 | grad norm: 108070.843 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1081/  159576 | consumed samples:        17296 | elapsed time per iteration (ms): 13934.8 | learning rate: 4.797E-06 | global batch size:    16 | lm loss: 7.444709E+00 | loss scale: 16384.0 | grad norm: 93912.071 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1082/  159576 | consumed samples:        17312 | elapsed time per iteration (ms): 13598.4 | learning rate: 4.802E-06 | global batch size:    16 | lm loss: 7.532929E+00 | loss scale: 16384.0 | grad norm: 76683.978 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1083/  159576 | consumed samples:        17328 | elapsed time per iteration (ms): 13510.5 | learning rate: 4.806E-06 | global batch size:    16 | lm loss: 7.599612E+00 | loss scale: 16384.0 | grad norm: 83858.264 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1084/  159576 | consumed samples:        17344 | elapsed time per iteration (ms): 13542.7 | learning rate: 4.811E-06 | global batch size:    16 | lm loss: 7.387773E+00 | loss scale: 16384.0 | grad norm: 63120.576 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1085/  159576 | consumed samples:        17360 | elapsed time per iteration (ms): 13555.5 | learning rate: 4.815E-06 | global batch size:    16 | lm loss: 7.289794E+00 | loss scale: 16384.0 | grad norm: 77022.669 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1086/  159576 | consumed samples:        17376 | elapsed time per iteration (ms): 13932.5 | learning rate: 4.820E-06 | global batch size:    16 | lm loss: 7.393349E+00 | loss scale: 16384.0 | grad norm: 79433.611 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1087/  159576 | consumed samples:        17392 | elapsed time per iteration (ms): 13479.9 | learning rate: 4.824E-06 | global batch size:    16 | lm loss: 7.321753E+00 | loss scale: 16384.0 | grad norm: 68970.976 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1088/  159576 | consumed samples:        17408 | elapsed time per iteration (ms): 13681.0 | learning rate: 4.828E-06 | global batch size:    16 | lm loss: 7.320374E+00 | loss scale: 16384.0 | grad norm: 73549.447 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1089/  159576 | consumed samples:        17424 | elapsed time per iteration (ms): 13654.0 | learning rate: 4.833E-06 | global batch size:    16 | lm loss: 7.605762E+00 | loss scale: 16384.0 | grad norm: 80374.482 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1090/  159576 | consumed samples:        17440 | elapsed time per iteration (ms): 14059.3 | learning rate: 4.837E-06 | global batch size:    16 | lm loss: 7.631133E+00 | loss scale: 16384.0 | grad norm: 82954.080 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1091/  159576 | consumed samples:        17456 | elapsed time per iteration (ms): 13724.8 | learning rate: 4.842E-06 | global batch size:    16 | lm loss: 7.507143E+00 | loss scale: 16384.0 | grad norm: 60066.048 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1092/  159576 | consumed samples:        17472 | elapsed time per iteration (ms): 13461.4 | learning rate: 4.846E-06 | global batch size:    16 | lm loss: 7.300464E+00 | loss scale: 16384.0 | grad norm: 116487.793 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1093/  159576 | consumed samples:        17488 | elapsed time per iteration (ms): 13525.0 | learning rate: 4.851E-06 | global batch size:    16 | lm loss: 7.388405E+00 | loss scale: 16384.0 | grad norm: 79147.305 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1094/  159576 | consumed samples:        17504 | elapsed time per iteration (ms): 13950.4 | learning rate: 4.855E-06 | global batch size:    16 | lm loss: 7.471725E+00 | loss scale: 16384.0 | grad norm: 90987.897 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1095/  159576 | consumed samples:        17520 | elapsed time per iteration (ms): 13624.6 | learning rate: 4.859E-06 | global batch size:    16 | lm loss: 7.530853E+00 | loss scale: 16384.0 | grad norm: 90057.826 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1096/  159576 | consumed samples:        17536 | elapsed time per iteration (ms): 13591.9 | learning rate: 4.864E-06 | global batch size:    16 | lm loss: 7.420722E+00 | loss scale: 16384.0 | grad norm: 76037.442 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1097/  159576 | consumed samples:        17552 | elapsed time per iteration (ms): 13587.0 | learning rate: 4.868E-06 | global batch size:    16 | lm loss: 7.363769E+00 | loss scale: 16384.0 | grad norm: 107388.359 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1098/  159576 | consumed samples:        17568 | elapsed time per iteration (ms): 13667.8 | learning rate: 4.873E-06 | global batch size:    16 | lm loss: 7.310038E+00 | loss scale: 16384.0 | grad norm: 72408.477 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1099/  159576 | consumed samples:        17584 | elapsed time per iteration (ms): 13707.4 | learning rate: 4.877E-06 | global batch size:    16 | lm loss: 7.291698E+00 | loss scale: 16384.0 | grad norm: 69292.261 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1100/  159576 | consumed samples:        17600 | elapsed time per iteration (ms): 13564.5 | learning rate: 4.882E-06 | global batch size:    16 | lm loss: 7.713614E+00 | loss scale: 16384.0 | grad norm: 87150.289 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1101/  159576 | consumed samples:        17616 | elapsed time per iteration (ms): 13621.9 | learning rate: 4.886E-06 | global batch size:    16 | lm loss: 7.482057E+00 | loss scale: 16384.0 | grad norm: 61713.123 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1102/  159576 | consumed samples:        17632 | elapsed time per iteration (ms): 13628.2 | learning rate: 4.891E-06 | global batch size:    16 | lm loss: 7.370234E+00 | loss scale: 16384.0 | grad norm: 83708.630 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1103/  159576 | consumed samples:        17648 | elapsed time per iteration (ms): 13962.7 | learning rate: 4.895E-06 | global batch size:    16 | lm loss: 7.373138E+00 | loss scale: 16384.0 | grad norm: 75905.969 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1104/  159576 | consumed samples:        17664 | elapsed time per iteration (ms): 13627.3 | learning rate: 4.899E-06 | global batch size:    16 | lm loss: 7.448909E+00 | loss scale: 16384.0 | grad norm: 135141.473 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1105/  159576 | consumed samples:        17680 | elapsed time per iteration (ms): 13640.6 | learning rate: 4.904E-06 | global batch size:    16 | lm loss: 7.252520E+00 | loss scale: 16384.0 | grad norm: 73661.038 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1106/  159576 | consumed samples:        17696 | elapsed time per iteration (ms): 13666.3 | learning rate: 4.908E-06 | global batch size:    16 | lm loss: 7.507257E+00 | loss scale: 16384.0 | grad norm: 108098.635 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1107/  159576 | consumed samples:        17712 | elapsed time per iteration (ms): 13849.3 | learning rate: 4.913E-06 | global batch size:    16 | lm loss: 7.429738E+00 | loss scale: 16384.0 | grad norm: 99851.193 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1108/  159576 | consumed samples:        17728 | elapsed time per iteration (ms): 13862.9 | learning rate: 4.917E-06 | global batch size:    16 | lm loss: 7.422798E+00 | loss scale: 16384.0 | grad norm: 90788.540 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1109/  159576 | consumed samples:        17744 | elapsed time per iteration (ms): 13640.2 | learning rate: 4.922E-06 | global batch size:    16 | lm loss: 7.656183E+00 | loss scale: 16384.0 | grad norm: 204462.632 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1110/  159576 | consumed samples:        17760 | elapsed time per iteration (ms): 13627.0 | learning rate: 4.926E-06 | global batch size:    16 | lm loss: 7.576304E+00 | loss scale: 16384.0 | grad norm: 166002.012 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1111/  159576 | consumed samples:        17776 | elapsed time per iteration (ms): 13632.9 | learning rate: 4.930E-06 | global batch size:    16 | lm loss: 7.626440E+00 | loss scale: 16384.0 | grad norm: 82466.643 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1112/  159576 | consumed samples:        17792 | elapsed time per iteration (ms): 13939.0 | learning rate: 4.935E-06 | global batch size:    16 | lm loss: 7.302793E+00 | loss scale: 16384.0 | grad norm: 150100.520 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1113/  159576 | consumed samples:        17808 | elapsed time per iteration (ms): 13640.4 | learning rate: 4.939E-06 | global batch size:    16 | lm loss: 7.493092E+00 | loss scale: 16384.0 | grad norm: 104956.045 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1114/  159576 | consumed samples:        17824 | elapsed time per iteration (ms): 13637.6 | learning rate: 4.944E-06 | global batch size:    16 | lm loss: 7.475542E+00 | loss scale: 16384.0 | grad norm: 86316.213 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1115/  159576 | consumed samples:        17840 | elapsed time per iteration (ms): 13630.5 | learning rate: 4.948E-06 | global batch size:    16 | lm loss: 7.367518E+00 | loss scale: 16384.0 | grad norm: 127229.616 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1116/  159576 | consumed samples:        17856 | elapsed time per iteration (ms): 13929.1 | learning rate: 4.953E-06 | global batch size:    16 | lm loss: 7.463512E+00 | loss scale: 16384.0 | grad norm: 80765.100 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1117/  159576 | consumed samples:        17872 | elapsed time per iteration (ms): 13651.9 | learning rate: 4.957E-06 | global batch size:    16 | lm loss: 7.389682E+00 | loss scale: 16384.0 | grad norm: 114274.057 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1118/  159576 | consumed samples:        17888 | elapsed time per iteration (ms): 13673.8 | learning rate: 4.962E-06 | global batch size:    16 | lm loss: 7.446970E+00 | loss scale: 16384.0 | grad norm: 93011.728 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1119/  159576 | consumed samples:        17904 | elapsed time per iteration (ms): 13700.2 | learning rate: 4.966E-06 | global batch size:    16 | lm loss: 7.314221E+00 | loss scale: 16384.0 | grad norm: 105575.833 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1120/  159576 | consumed samples:        17920 | elapsed time per iteration (ms): 13702.7 | learning rate: 4.970E-06 | global batch size:    16 | lm loss: 7.372279E+00 | loss scale: 16384.0 | grad norm: 77507.701 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1121/  159576 | consumed samples:        17936 | elapsed time per iteration (ms): 13869.6 | learning rate: 4.975E-06 | global batch size:    16 | lm loss: 7.535093E+00 | loss scale: 16384.0 | grad norm: 98620.342 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1122/  159576 | consumed samples:        17952 | elapsed time per iteration (ms): 13679.6 | learning rate: 4.979E-06 | global batch size:    16 | lm loss: 8.079200E+00 | loss scale: 16384.0 | grad norm: 187332.489 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1123/  159576 | consumed samples:        17968 | elapsed time per iteration (ms): 13672.8 | learning rate: 4.984E-06 | global batch size:    16 | lm loss: 7.433456E+00 | loss scale: 16384.0 | grad norm: 139834.433 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1124/  159576 | consumed samples:        17984 | elapsed time per iteration (ms): 13651.7 | learning rate: 4.988E-06 | global batch size:    16 | lm loss: 7.440439E+00 | loss scale: 16384.0 | grad norm: 91486.607 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1125/  159576 | consumed samples:        18000 | elapsed time per iteration (ms): 14085.1 | learning rate: 4.993E-06 | global batch size:    16 | lm loss: 7.453449E+00 | loss scale: 16384.0 | grad norm: 170685.218 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1126/  159576 | consumed samples:        18016 | elapsed time per iteration (ms): 13744.0 | learning rate: 4.997E-06 | global batch size:    16 | lm loss: 7.544756E+00 | loss scale: 16384.0 | grad norm: 93482.948 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1127/  159576 | consumed samples:        18032 | elapsed time per iteration (ms): 13666.9 | learning rate: 5.001E-06 | global batch size:    16 | lm loss: 7.435877E+00 | loss scale: 16384.0 | grad norm: 98259.154 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1128/  159576 | consumed samples:        18048 | elapsed time per iteration (ms): 13692.7 | learning rate: 5.006E-06 | global batch size:    16 | lm loss: 7.496342E+00 | loss scale: 16384.0 | grad norm: 130279.795 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1129/  159576 | consumed samples:        18064 | elapsed time per iteration (ms): 14100.4 | learning rate: 5.010E-06 | global batch size:    16 | lm loss: 7.501980E+00 | loss scale: 16384.0 | grad norm: 88561.836 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1130/  159576 | consumed samples:        18080 | elapsed time per iteration (ms): 13620.7 | learning rate: 5.015E-06 | global batch size:    16 | lm loss: 7.470133E+00 | loss scale: 16384.0 | grad norm: 155289.997 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1131/  159576 | consumed samples:        18096 | elapsed time per iteration (ms): 13683.0 | learning rate: 5.019E-06 | global batch size:    16 | lm loss: 7.539918E+00 | loss scale: 16384.0 | grad norm: 89135.032 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1132/  159576 | consumed samples:        18112 | elapsed time per iteration (ms): 13643.2 | learning rate: 5.024E-06 | global batch size:    16 | lm loss: 7.537309E+00 | loss scale: 16384.0 | grad norm: 83460.414 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1133/  159576 | consumed samples:        18128 | elapsed time per iteration (ms): 13758.8 | learning rate: 5.028E-06 | global batch size:    16 | lm loss: 7.445082E+00 | loss scale: 16384.0 | grad norm: 97599.513 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1134/  159576 | consumed samples:        18144 | elapsed time per iteration (ms): 13842.3 | learning rate: 5.033E-06 | global batch size:    16 | lm loss: 7.533705E+00 | loss scale: 16384.0 | grad norm: 153106.257 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1135/  159576 | consumed samples:        18160 | elapsed time per iteration (ms): 13641.3 | learning rate: 5.037E-06 | global batch size:    16 | lm loss: 7.351761E+00 | loss scale: 16384.0 | grad norm: 139552.025 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1136/  159576 | consumed samples:        18176 | elapsed time per iteration (ms): 13757.6 | learning rate: 5.041E-06 | global batch size:    16 | lm loss: 7.386802E+00 | loss scale: 16384.0 | grad norm: 82271.014 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1137/  159576 | consumed samples:        18192 | elapsed time per iteration (ms): 13590.7 | learning rate: 5.046E-06 | global batch size:    16 | lm loss: 7.276345E+00 | loss scale: 16384.0 | grad norm: 139306.896 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1138/  159576 | consumed samples:        18208 | elapsed time per iteration (ms): 14099.6 | learning rate: 5.050E-06 | global batch size:    16 | lm loss: 7.489694E+00 | loss scale: 16384.0 | grad norm: 75568.533 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1139/  159576 | consumed samples:        18224 | elapsed time per iteration (ms): 13765.0 | learning rate: 5.055E-06 | global batch size:    16 | lm loss: 6.968816E+00 | loss scale: 16384.0 | grad norm: 118020.093 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1140/  159576 | consumed samples:        18240 | elapsed time per iteration (ms): 13662.4 | learning rate: 5.059E-06 | global batch size:    16 | lm loss: 7.446542E+00 | loss scale: 16384.0 | grad norm: 117497.431 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1141/  159576 | consumed samples:        18256 | elapsed time per iteration (ms): 13747.0 | learning rate: 5.064E-06 | global batch size:    16 | lm loss: 7.328124E+00 | loss scale: 16384.0 | grad norm: 126653.284 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1142/  159576 | consumed samples:        18272 | elapsed time per iteration (ms): 14086.2 | learning rate: 5.068E-06 | global batch size:    16 | lm loss: 7.359120E+00 | loss scale: 16384.0 | grad norm: 158587.176 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1143/  159576 | consumed samples:        18288 | elapsed time per iteration (ms): 13785.6 | learning rate: 5.072E-06 | global batch size:    16 | lm loss: 7.289187E+00 | loss scale: 16384.0 | grad norm: 93193.500 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1144/  159576 | consumed samples:        18304 | elapsed time per iteration (ms): 13650.1 | learning rate: 5.077E-06 | global batch size:    16 | lm loss: 7.541381E+00 | loss scale: 16384.0 | grad norm: 127276.458 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1145/  159576 | consumed samples:        18320 | elapsed time per iteration (ms): 13673.3 | learning rate: 5.081E-06 | global batch size:    16 | lm loss: 7.343310E+00 | loss scale: 16384.0 | grad norm: 141086.682 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1146/  159576 | consumed samples:        18336 | elapsed time per iteration (ms): 13709.3 | learning rate: 5.086E-06 | global batch size:    16 | lm loss: 7.291780E+00 | loss scale: 16384.0 | grad norm: 84706.443 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1147/  159576 | consumed samples:        18352 | elapsed time per iteration (ms): 13798.7 | learning rate: 5.090E-06 | global batch size:    16 | lm loss: 7.395382E+00 | loss scale: 16384.0 | grad norm: 168181.547 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1148/  159576 | consumed samples:        18368 | elapsed time per iteration (ms): 13678.3 | learning rate: 5.095E-06 | global batch size:    16 | lm loss: 7.287755E+00 | loss scale: 16384.0 | grad norm: 150595.173 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1149/  159576 | consumed samples:        18384 | elapsed time per iteration (ms): 13705.6 | learning rate: 5.099E-06 | global batch size:    16 | lm loss: 7.521116E+00 | loss scale: 16384.0 | grad norm: 90594.393 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1150/  159576 | consumed samples:        18400 | elapsed time per iteration (ms): 13724.2 | learning rate: 5.104E-06 | global batch size:    16 | lm loss: 7.560548E+00 | loss scale: 16384.0 | grad norm: 124093.174 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1151/  159576 | consumed samples:        18416 | elapsed time per iteration (ms): 14011.4 | learning rate: 5.108E-06 | global batch size:    16 | lm loss: 7.334007E+00 | loss scale: 16384.0 | grad norm: 93590.799 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1152/  159576 | consumed samples:        18432 | elapsed time per iteration (ms): 13638.1 | learning rate: 5.112E-06 | global batch size:    16 | lm loss: 7.340695E+00 | loss scale: 16384.0 | grad norm: 120515.541 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1153/  159576 | consumed samples:        18448 | elapsed time per iteration (ms): 13670.9 | learning rate: 5.117E-06 | global batch size:    16 | lm loss: 7.310359E+00 | loss scale: 16384.0 | grad norm: 121580.561 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1154/  159576 | consumed samples:        18464 | elapsed time per iteration (ms): 13692.4 | learning rate: 5.121E-06 | global batch size:    16 | lm loss: 7.407881E+00 | loss scale: 16384.0 | grad norm: 86210.472 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1155/  159576 | consumed samples:        18480 | elapsed time per iteration (ms): 14124.7 | learning rate: 5.126E-06 | global batch size:    16 | lm loss: 7.533539E+00 | loss scale: 16384.0 | grad norm: 117499.375 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1156/  159576 | consumed samples:        18496 | elapsed time per iteration (ms): 13713.9 | learning rate: 5.130E-06 | global batch size:    16 | lm loss: 7.454373E+00 | loss scale: 16384.0 | grad norm: 82164.881 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1157/  159576 | consumed samples:        18512 | elapsed time per iteration (ms): 13665.0 | learning rate: 5.135E-06 | global batch size:    16 | lm loss: 6.997806E+00 | loss scale: 16384.0 | grad norm: 118291.842 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1158/  159576 | consumed samples:        18528 | elapsed time per iteration (ms): 13620.7 | learning rate: 5.139E-06 | global batch size:    16 | lm loss: 7.155181E+00 | loss scale: 16384.0 | grad norm: 80841.378 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1159/  159576 | consumed samples:        18544 | elapsed time per iteration (ms): 13522.0 | learning rate: 5.143E-06 | global batch size:    16 | lm loss: 7.303053E+00 | loss scale: 16384.0 | grad norm: 153692.954 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1160/  159576 | consumed samples:        18560 | elapsed time per iteration (ms): 13934.6 | learning rate: 5.148E-06 | global batch size:    16 | lm loss: 7.453541E+00 | loss scale: 16384.0 | grad norm: 178564.006 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1161/  159576 | consumed samples:        18576 | elapsed time per iteration (ms): 13591.1 | learning rate: 5.152E-06 | global batch size:    16 | lm loss: 7.370741E+00 | loss scale: 16384.0 | grad norm: 96828.834 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1162/  159576 | consumed samples:        18592 | elapsed time per iteration (ms): 13610.9 | learning rate: 5.157E-06 | global batch size:    16 | lm loss: 7.395625E+00 | loss scale: 16384.0 | grad norm: 138531.373 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1163/  159576 | consumed samples:        18608 | elapsed time per iteration (ms): 13633.4 | learning rate: 5.161E-06 | global batch size:    16 | lm loss: 7.721334E+00 | loss scale: 16384.0 | grad norm: 107198.076 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1164/  159576 | consumed samples:        18624 | elapsed time per iteration (ms): 13919.7 | learning rate: 5.166E-06 | global batch size:    16 | lm loss: 7.418262E+00 | loss scale: 16384.0 | grad norm: 104593.384 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1165/  159576 | consumed samples:        18640 | elapsed time per iteration (ms): 13699.8 | learning rate: 5.170E-06 | global batch size:    16 | lm loss: 7.388452E+00 | loss scale: 16384.0 | grad norm: 87922.625 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1166/  159576 | consumed samples:        18656 | elapsed time per iteration (ms): 13567.0 | learning rate: 5.175E-06 | global batch size:    16 | lm loss: 7.359789E+00 | loss scale: 16384.0 | grad norm: 167490.320 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1167/  159576 | consumed samples:        18672 | elapsed time per iteration (ms): 13665.3 | learning rate: 5.179E-06 | global batch size:    16 | lm loss: 7.513920E+00 | loss scale: 16384.0 | grad norm: 187148.881 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1168/  159576 | consumed samples:        18688 | elapsed time per iteration (ms): 13712.9 | learning rate: 5.183E-06 | global batch size:    16 | lm loss: 7.333634E+00 | loss scale: 16384.0 | grad norm: 80524.927 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1169/  159576 | consumed samples:        18704 | elapsed time per iteration (ms): 13807.4 | learning rate: 5.188E-06 | global batch size:    16 | lm loss: 7.551642E+00 | loss scale: 16384.0 | grad norm: 96715.430 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1170/  159576 | consumed samples:        18720 | elapsed time per iteration (ms): 13672.0 | learning rate: 5.192E-06 | global batch size:    16 | lm loss: 7.354926E+00 | loss scale: 16384.0 | grad norm: 108931.618 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1171/  159576 | consumed samples:        18736 | elapsed time per iteration (ms): 13735.2 | learning rate: 5.197E-06 | global batch size:    16 | lm loss: 7.360828E+00 | loss scale: 16384.0 | grad norm: 93043.561 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1172/  159576 | consumed samples:        18752 | elapsed time per iteration (ms): 13717.8 | learning rate: 5.201E-06 | global batch size:    16 | lm loss: 7.538117E+00 | loss scale: 16384.0 | grad norm: 318365.891 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1173/  159576 | consumed samples:        18768 | elapsed time per iteration (ms): 13883.3 | learning rate: 5.206E-06 | global batch size:    16 | lm loss: 7.601986E+00 | loss scale: 16384.0 | grad norm: 139775.022 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1174/  159576 | consumed samples:        18784 | elapsed time per iteration (ms): 13707.5 | learning rate: 5.210E-06 | global batch size:    16 | lm loss: 7.492588E+00 | loss scale: 16384.0 | grad norm: 90689.919 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1175/  159576 | consumed samples:        18800 | elapsed time per iteration (ms): 13678.7 | learning rate: 5.214E-06 | global batch size:    16 | lm loss: 7.586353E+00 | loss scale: 16384.0 | grad norm: 123587.039 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1176/  159576 | consumed samples:        18816 | elapsed time per iteration (ms): 13643.8 | learning rate: 5.219E-06 | global batch size:    16 | lm loss: 7.585982E+00 | loss scale: 16384.0 | grad norm: 134121.461 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1177/  159576 | consumed samples:        18832 | elapsed time per iteration (ms): 13876.6 | learning rate: 5.223E-06 | global batch size:    16 | lm loss: 7.290177E+00 | loss scale: 16384.0 | grad norm: 61795.500 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1178/  159576 | consumed samples:        18848 | elapsed time per iteration (ms): 13887.6 | learning rate: 5.228E-06 | global batch size:    16 | lm loss: 7.394442E+00 | loss scale: 16384.0 | grad norm: 214580.050 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1179/  159576 | consumed samples:        18864 | elapsed time per iteration (ms): 13671.2 | learning rate: 5.232E-06 | global batch size:    16 | lm loss: 7.342830E+00 | loss scale: 16384.0 | grad norm: 170377.555 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1180/  159576 | consumed samples:        18880 | elapsed time per iteration (ms): 13615.6 | learning rate: 5.237E-06 | global batch size:    16 | lm loss: 7.353875E+00 | loss scale: 16384.0 | grad norm: 98364.101 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1181/  159576 | consumed samples:        18896 | elapsed time per iteration (ms): 13659.2 | learning rate: 5.241E-06 | global batch size:    16 | lm loss: 7.310112E+00 | loss scale: 16384.0 | grad norm: 153347.882 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1182/  159576 | consumed samples:        18912 | elapsed time per iteration (ms): 13718.2 | learning rate: 5.246E-06 | global batch size:    16 | lm loss: 7.516181E+00 | loss scale: 16384.0 | grad norm: 183425.509 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1183/  159576 | consumed samples:        18928 | elapsed time per iteration (ms): 13614.7 | learning rate: 5.250E-06 | global batch size:    16 | lm loss: 7.284205E+00 | loss scale: 16384.0 | grad norm: 116539.767 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1184/  159576 | consumed samples:        18944 | elapsed time per iteration (ms): 13636.1 | learning rate: 5.254E-06 | global batch size:    16 | lm loss: 7.392292E+00 | loss scale: 16384.0 | grad norm: 167498.612 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1185/  159576 | consumed samples:        18960 | elapsed time per iteration (ms): 13633.9 | learning rate: 5.259E-06 | global batch size:    16 | lm loss: 7.250909E+00 | loss scale: 16384.0 | grad norm: 100955.402 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1186/  159576 | consumed samples:        18976 | elapsed time per iteration (ms): 13999.4 | learning rate: 5.263E-06 | global batch size:    16 | lm loss: 7.536862E+00 | loss scale: 16384.0 | grad norm: 100050.160 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1187/  159576 | consumed samples:        18992 | elapsed time per iteration (ms): 13653.6 | learning rate: 5.268E-06 | global batch size:    16 | lm loss: 7.565104E+00 | loss scale: 16384.0 | grad norm: 118619.018 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1188/  159576 | consumed samples:        19008 | elapsed time per iteration (ms): 13606.5 | learning rate: 5.272E-06 | global batch size:    16 | lm loss: 7.258739E+00 | loss scale: 16384.0 | grad norm: 126790.154 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1189/  159576 | consumed samples:        19024 | elapsed time per iteration (ms): 13571.9 | learning rate: 5.277E-06 | global batch size:    16 | lm loss: 7.184493E+00 | loss scale: 16384.0 | grad norm: 84818.036 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1190/  159576 | consumed samples:        19040 | elapsed time per iteration (ms): 13962.3 | learning rate: 5.281E-06 | global batch size:    16 | lm loss: 7.209998E+00 | loss scale: 16384.0 | grad norm: 131280.260 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1191/  159576 | consumed samples:        19056 | elapsed time per iteration (ms): 13770.8 | learning rate: 5.286E-06 | global batch size:    16 | lm loss: 7.406217E+00 | loss scale: 16384.0 | grad norm: 110178.484 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1192/  159576 | consumed samples:        19072 | elapsed time per iteration (ms): 13665.3 | learning rate: 5.290E-06 | global batch size:    16 | lm loss: 7.350411E+00 | loss scale: 16384.0 | grad norm: 81228.032 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1193/  159576 | consumed samples:        19088 | elapsed time per iteration (ms): 13585.9 | learning rate: 5.294E-06 | global batch size:    16 | lm loss: 7.583058E+00 | loss scale: 16384.0 | grad norm: 291080.363 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1194/  159576 | consumed samples:        19104 | elapsed time per iteration (ms): 13658.0 | learning rate: 5.299E-06 | global batch size:    16 | lm loss: 7.808938E+00 | loss scale: 16384.0 | grad norm: 193632.364 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1195/  159576 | consumed samples:        19120 | elapsed time per iteration (ms): 13777.0 | learning rate: 5.303E-06 | global batch size:    16 | lm loss: 7.459247E+00 | loss scale: 16384.0 | grad norm: 100738.405 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1196/  159576 | consumed samples:        19136 | elapsed time per iteration (ms): 13624.3 | learning rate: 5.308E-06 | global batch size:    16 | lm loss: 7.240894E+00 | loss scale: 16384.0 | grad norm: 102223.561 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1197/  159576 | consumed samples:        19152 | elapsed time per iteration (ms): 13630.2 | learning rate: 5.312E-06 | global batch size:    16 | lm loss: 7.469604E+00 | loss scale: 16384.0 | grad norm: 91547.502 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1198/  159576 | consumed samples:        19168 | elapsed time per iteration (ms): 13603.4 | learning rate: 5.317E-06 | global batch size:    16 | lm loss: 7.399169E+00 | loss scale: 16384.0 | grad norm: 246196.581 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1199/  159576 | consumed samples:        19184 | elapsed time per iteration (ms): 14028.5 | learning rate: 5.321E-06 | global batch size:    16 | lm loss: 7.465099E+00 | loss scale: 16384.0 | grad norm: 185665.583 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1200/  159576 | consumed samples:        19200 | elapsed time per iteration (ms): 13601.1 | learning rate: 5.325E-06 | global batch size:    16 | lm loss: 7.383169E+00 | loss scale: 16384.0 | grad norm: 115872.720 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1201/  159576 | consumed samples:        19216 | elapsed time per iteration (ms): 13566.6 | learning rate: 5.330E-06 | global batch size:    16 | lm loss: 7.352910E+00 | loss scale: 16384.0 | grad norm: 114834.353 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1202/  159576 | consumed samples:        19232 | elapsed time per iteration (ms): 13557.4 | learning rate: 5.334E-06 | global batch size:    16 | lm loss: 7.521720E+00 | loss scale: 16384.0 | grad norm: 101976.012 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1203/  159576 | consumed samples:        19248 | elapsed time per iteration (ms): 13525.0 | learning rate: 5.339E-06 | global batch size:    16 | lm loss: 7.225696E+00 | loss scale: 16384.0 | grad norm: 178745.243 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1204/  159576 | consumed samples:        19264 | elapsed time per iteration (ms): 13539.3 | learning rate: 5.343E-06 | global batch size:    16 | lm loss: 7.375963E+00 | loss scale: 16384.0 | grad norm: 175723.616 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1205/  159576 | consumed samples:        19280 | elapsed time per iteration (ms): 13532.3 | learning rate: 5.348E-06 | global batch size:    16 | lm loss: 7.402988E+00 | loss scale: 16384.0 | grad norm: 104645.448 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1206/  159576 | consumed samples:        19296 | elapsed time per iteration (ms): 13502.9 | learning rate: 5.352E-06 | global batch size:    16 | lm loss: 7.302839E+00 | loss scale: 16384.0 | grad norm: 99328.230 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1207/  159576 | consumed samples:        19312 | elapsed time per iteration (ms): 13540.4 | learning rate: 5.357E-06 | global batch size:    16 | lm loss: 7.555269E+00 | loss scale: 16384.0 | grad norm: 89166.858 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1208/  159576 | consumed samples:        19328 | elapsed time per iteration (ms): 13900.0 | learning rate: 5.361E-06 | global batch size:    16 | lm loss: 7.459805E+00 | loss scale: 16384.0 | grad norm: 135152.393 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1209/  159576 | consumed samples:        19344 | elapsed time per iteration (ms): 13560.6 | learning rate: 5.365E-06 | global batch size:    16 | lm loss: 7.419579E+00 | loss scale: 16384.0 | grad norm: 101249.512 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1210/  159576 | consumed samples:        19360 | elapsed time per iteration (ms): 13658.8 | learning rate: 5.370E-06 | global batch size:    16 | lm loss: 7.348646E+00 | loss scale: 16384.0 | grad norm: 104483.609 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1211/  159576 | consumed samples:        19376 | elapsed time per iteration (ms): 13533.6 | learning rate: 5.374E-06 | global batch size:    16 | lm loss: 7.494230E+00 | loss scale: 16384.0 | grad norm: 110210.437 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1212/  159576 | consumed samples:        19392 | elapsed time per iteration (ms): 13905.0 | learning rate: 5.379E-06 | global batch size:    16 | lm loss: 7.390188E+00 | loss scale: 16384.0 | grad norm: 96645.582 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1213/  159576 | consumed samples:        19408 | elapsed time per iteration (ms): 13673.2 | learning rate: 5.383E-06 | global batch size:    16 | lm loss: 7.318599E+00 | loss scale: 16384.0 | grad norm: 166216.352 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1214/  159576 | consumed samples:        19424 | elapsed time per iteration (ms): 13582.9 | learning rate: 5.388E-06 | global batch size:    16 | lm loss: 7.262068E+00 | loss scale: 16384.0 | grad norm: 75724.522 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1215/  159576 | consumed samples:        19440 | elapsed time per iteration (ms): 13570.1 | learning rate: 5.392E-06 | global batch size:    16 | lm loss: 7.594563E+00 | loss scale: 16384.0 | grad norm: 95306.819 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1216/  159576 | consumed samples:        19456 | elapsed time per iteration (ms): 13639.7 | learning rate: 5.396E-06 | global batch size:    16 | lm loss: 7.375734E+00 | loss scale: 16384.0 | grad norm: 86152.125 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1217/  159576 | consumed samples:        19472 | elapsed time per iteration (ms): 14091.6 | learning rate: 5.401E-06 | global batch size:    16 | lm loss: 7.213047E+00 | loss scale: 16384.0 | grad norm: 95583.311 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1218/  159576 | consumed samples:        19488 | elapsed time per iteration (ms): 13516.3 | learning rate: 5.405E-06 | global batch size:    16 | lm loss: 7.437682E+00 | loss scale: 16384.0 | grad norm: 221549.634 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1219/  159576 | consumed samples:        19504 | elapsed time per iteration (ms): 13610.0 | learning rate: 5.410E-06 | global batch size:    16 | lm loss: 7.254605E+00 | loss scale: 16384.0 | grad norm: 97554.516 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1220/  159576 | consumed samples:        19520 | elapsed time per iteration (ms): 13565.5 | learning rate: 5.414E-06 | global batch size:    16 | lm loss: 7.248229E+00 | loss scale: 16384.0 | grad norm: 89138.195 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1221/  159576 | consumed samples:        19536 | elapsed time per iteration (ms): 13989.3 | learning rate: 5.419E-06 | global batch size:    16 | lm loss: 7.313151E+00 | loss scale: 16384.0 | grad norm: 172651.828 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1222/  159576 | consumed samples:        19552 | elapsed time per iteration (ms): 13602.4 | learning rate: 5.423E-06 | global batch size:    16 | lm loss: 7.476789E+00 | loss scale: 16384.0 | grad norm: 67387.822 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1223/  159576 | consumed samples:        19568 | elapsed time per iteration (ms): 13656.0 | learning rate: 5.428E-06 | global batch size:    16 | lm loss: 7.289939E+00 | loss scale: 16384.0 | grad norm: 207125.248 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1224/  159576 | consumed samples:        19584 | elapsed time per iteration (ms): 13537.8 | learning rate: 5.432E-06 | global batch size:    16 | lm loss: 7.409894E+00 | loss scale: 16384.0 | grad norm: 156218.537 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1225/  159576 | consumed samples:        19600 | elapsed time per iteration (ms): 13600.0 | learning rate: 5.436E-06 | global batch size:    16 | lm loss: 7.226832E+00 | loss scale: 16384.0 | grad norm: 93258.536 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1226/  159576 | consumed samples:        19616 | elapsed time per iteration (ms): 13778.7 | learning rate: 5.441E-06 | global batch size:    16 | lm loss: 7.406470E+00 | loss scale: 16384.0 | grad norm: 95037.623 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1227/  159576 | consumed samples:        19632 | elapsed time per iteration (ms): 13609.5 | learning rate: 5.445E-06 | global batch size:    16 | lm loss: 7.385060E+00 | loss scale: 16384.0 | grad norm: 77831.367 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1228/  159576 | consumed samples:        19648 | elapsed time per iteration (ms): 13561.8 | learning rate: 5.450E-06 | global batch size:    16 | lm loss: 7.283795E+00 | loss scale: 16384.0 | grad norm: 219813.514 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1229/  159576 | consumed samples:        19664 | elapsed time per iteration (ms): 13619.4 | learning rate: 5.454E-06 | global batch size:    16 | lm loss: 7.344219E+00 | loss scale: 16384.0 | grad norm: 122192.335 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1230/  159576 | consumed samples:        19680 | elapsed time per iteration (ms): 14054.6 | learning rate: 5.459E-06 | global batch size:    16 | lm loss: 7.364305E+00 | loss scale: 16384.0 | grad norm: 90944.731 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1231/  159576 | consumed samples:        19696 | elapsed time per iteration (ms): 13589.9 | learning rate: 5.463E-06 | global batch size:    16 | lm loss: 7.421730E+00 | loss scale: 16384.0 | grad norm: 178816.259 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1232/  159576 | consumed samples:        19712 | elapsed time per iteration (ms): 13624.6 | learning rate: 5.467E-06 | global batch size:    16 | lm loss: 7.278720E+00 | loss scale: 16384.0 | grad norm: 101190.498 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1233/  159576 | consumed samples:        19728 | elapsed time per iteration (ms): 13574.7 | learning rate: 5.472E-06 | global batch size:    16 | lm loss: 7.525582E+00 | loss scale: 16384.0 | grad norm: 95476.386 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1234/  159576 | consumed samples:        19744 | elapsed time per iteration (ms): 13981.0 | learning rate: 5.476E-06 | global batch size:    16 | lm loss: 7.294508E+00 | loss scale: 16384.0 | grad norm: 110379.726 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1235/  159576 | consumed samples:        19760 | elapsed time per iteration (ms): 13641.1 | learning rate: 5.481E-06 | global batch size:    16 | lm loss: 7.431972E+00 | loss scale: 16384.0 | grad norm: 103188.497 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1236/  159576 | consumed samples:        19776 | elapsed time per iteration (ms): 13575.4 | learning rate: 5.485E-06 | global batch size:    16 | lm loss: 7.397687E+00 | loss scale: 16384.0 | grad norm: 92125.975 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1237/  159576 | consumed samples:        19792 | elapsed time per iteration (ms): 13672.0 | learning rate: 5.490E-06 | global batch size:    16 | lm loss: 7.314774E+00 | loss scale: 16384.0 | grad norm: 75870.645 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1238/  159576 | consumed samples:        19808 | elapsed time per iteration (ms): 13509.4 | learning rate: 5.494E-06 | global batch size:    16 | lm loss: 7.187806E+00 | loss scale: 16384.0 | grad norm: 173296.806 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1239/  159576 | consumed samples:        19824 | elapsed time per iteration (ms): 13875.3 | learning rate: 5.499E-06 | global batch size:    16 | lm loss: 7.376097E+00 | loss scale: 16384.0 | grad norm: 133632.906 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1240/  159576 | consumed samples:        19840 | elapsed time per iteration (ms): 13610.1 | learning rate: 5.503E-06 | global batch size:    16 | lm loss: 7.267582E+00 | loss scale: 16384.0 | grad norm: 85104.985 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1241/  159576 | consumed samples:        19856 | elapsed time per iteration (ms): 13551.5 | learning rate: 5.507E-06 | global batch size:    16 | lm loss: 7.352735E+00 | loss scale: 16384.0 | grad norm: 90699.366 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1242/  159576 | consumed samples:        19872 | elapsed time per iteration (ms): 13593.9 | learning rate: 5.512E-06 | global batch size:    16 | lm loss: 7.468503E+00 | loss scale: 16384.0 | grad norm: 83188.176 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1243/  159576 | consumed samples:        19888 | elapsed time per iteration (ms): 13930.9 | learning rate: 5.516E-06 | global batch size:    16 | lm loss: 7.214951E+00 | loss scale: 16384.0 | grad norm: 78366.480 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1244/  159576 | consumed samples:        19904 | elapsed time per iteration (ms): 13652.1 | learning rate: 5.521E-06 | global batch size:    16 | lm loss: 7.260246E+00 | loss scale: 16384.0 | grad norm: 80928.941 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-24 07:03:47] PULSE: tr8-104B is waiting for the previous Job Array job to finish before scheduling a new one (1162855_[2-10%1] on 'gpu_p13' partition)
[2021-09-24 07:03:47] PULSE: tr8-104B is running for 1:11:36 since 2021-09-24T05:52:11 (1162855_1 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8])
 iteration     1245/  159576 | consumed samples:        19920 | elapsed time per iteration (ms): 13521.2 | learning rate: 5.525E-06 | global batch size:    16 | lm loss: 7.539850E+00 | loss scale: 16384.0 | grad norm: 85379.198 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1246/  159576 | consumed samples:        19936 | elapsed time per iteration (ms): 13540.5 | learning rate: 5.530E-06 | global batch size:    16 | lm loss: 7.541747E+00 | loss scale: 16384.0 | grad norm: 112594.519 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1247/  159576 | consumed samples:        19952 | elapsed time per iteration (ms): 13599.8 | learning rate: 5.534E-06 | global batch size:    16 | lm loss: 7.427727E+00 | loss scale: 16384.0 | grad norm: 75830.490 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1248/  159576 | consumed samples:        19968 | elapsed time per iteration (ms): 13827.8 | learning rate: 5.538E-06 | global batch size:    16 | lm loss: 7.407825E+00 | loss scale: 16384.0 | grad norm: 125194.168 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1249/  159576 | consumed samples:        19984 | elapsed time per iteration (ms): 13505.2 | learning rate: 5.543E-06 | global batch size:    16 | lm loss: 7.566711E+00 | loss scale: 16384.0 | grad norm: 116825.251 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1250/  159576 | consumed samples:        20000 | elapsed time per iteration (ms): 13584.6 | learning rate: 5.547E-06 | global batch size:    16 | lm loss: 7.156756E+00 | loss scale: 16384.0 | grad norm: 75875.506 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1251/  159576 | consumed samples:        20016 | elapsed time per iteration (ms): 13599.4 | learning rate: 5.552E-06 | global batch size:    16 | lm loss: 7.355666E+00 | loss scale: 16384.0 | grad norm: 128516.877 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1252/  159576 | consumed samples:        20032 | elapsed time per iteration (ms): 13882.6 | learning rate: 5.556E-06 | global batch size:    16 | lm loss: 7.339529E+00 | loss scale: 16384.0 | grad norm: 92000.517 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1253/  159576 | consumed samples:        20048 | elapsed time per iteration (ms): 13669.5 | learning rate: 5.561E-06 | global batch size:    16 | lm loss: 7.246970E+00 | loss scale: 16384.0 | grad norm: 68938.329 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1254/  159576 | consumed samples:        20064 | elapsed time per iteration (ms): 13534.9 | learning rate: 5.565E-06 | global batch size:    16 | lm loss: 7.505607E+00 | loss scale: 16384.0 | grad norm: 103078.555 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1255/  159576 | consumed samples:        20080 | elapsed time per iteration (ms): 13594.8 | learning rate: 5.570E-06 | global batch size:    16 | lm loss: 7.386476E+00 | loss scale: 16384.0 | grad norm: 104529.887 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1256/  159576 | consumed samples:        20096 | elapsed time per iteration (ms): 13795.8 | learning rate: 5.574E-06 | global batch size:    16 | lm loss: 7.263406E+00 | loss scale: 16384.0 | grad norm: 82840.246 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1257/  159576 | consumed samples:        20112 | elapsed time per iteration (ms): 13529.7 | learning rate: 5.578E-06 | global batch size:    16 | lm loss: 7.356731E+00 | loss scale: 16384.0 | grad norm: 64612.754 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1258/  159576 | consumed samples:        20128 | elapsed time per iteration (ms): 13538.7 | learning rate: 5.583E-06 | global batch size:    16 | lm loss: 7.516888E+00 | loss scale: 16384.0 | grad norm: 136048.030 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1259/  159576 | consumed samples:        20144 | elapsed time per iteration (ms): 13556.0 | learning rate: 5.587E-06 | global batch size:    16 | lm loss: 7.352553E+00 | loss scale: 16384.0 | grad norm: 81380.126 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1260/  159576 | consumed samples:        20160 | elapsed time per iteration (ms): 13488.1 | learning rate: 5.592E-06 | global batch size:    16 | lm loss: 7.385587E+00 | loss scale: 16384.0 | grad norm: 121637.321 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1261/  159576 | consumed samples:        20176 | elapsed time per iteration (ms): 13803.4 | learning rate: 5.596E-06 | global batch size:    16 | lm loss: 7.280743E+00 | loss scale: 16384.0 | grad norm: 89726.532 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1262/  159576 | consumed samples:        20192 | elapsed time per iteration (ms): 13426.2 | learning rate: 5.601E-06 | global batch size:    16 | lm loss: 7.512013E+00 | loss scale: 16384.0 | grad norm: 85518.754 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1263/  159576 | consumed samples:        20208 | elapsed time per iteration (ms): 13492.1 | learning rate: 5.605E-06 | global batch size:    16 | lm loss: 7.145048E+00 | loss scale: 16384.0 | grad norm: 112279.366 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1264/  159576 | consumed samples:        20224 | elapsed time per iteration (ms): 13537.9 | learning rate: 5.609E-06 | global batch size:    16 | lm loss: 7.608912E+00 | loss scale: 16384.0 | grad norm: 96612.876 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1265/  159576 | consumed samples:        20240 | elapsed time per iteration (ms): 13857.6 | learning rate: 5.614E-06 | global batch size:    16 | lm loss: 7.316525E+00 | loss scale: 16384.0 | grad norm: 73736.489 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1266/  159576 | consumed samples:        20256 | elapsed time per iteration (ms): 13475.3 | learning rate: 5.618E-06 | global batch size:    16 | lm loss: 7.406303E+00 | loss scale: 16384.0 | grad norm: 69485.433 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1267/  159576 | consumed samples:        20272 | elapsed time per iteration (ms): 13513.4 | learning rate: 5.623E-06 | global batch size:    16 | lm loss: 7.282144E+00 | loss scale: 16384.0 | grad norm: 72619.526 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1268/  159576 | consumed samples:        20288 | elapsed time per iteration (ms): 13517.8 | learning rate: 5.627E-06 | global batch size:    16 | lm loss: 7.419368E+00 | loss scale: 16384.0 | grad norm: 107085.697 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1269/  159576 | consumed samples:        20304 | elapsed time per iteration (ms): 13507.2 | learning rate: 5.632E-06 | global batch size:    16 | lm loss: 7.427319E+00 | loss scale: 16384.0 | grad norm: 75455.531 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1270/  159576 | consumed samples:        20320 | elapsed time per iteration (ms): 13744.8 | learning rate: 5.636E-06 | global batch size:    16 | lm loss: 7.348005E+00 | loss scale: 16384.0 | grad norm: 119801.062 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1271/  159576 | consumed samples:        20336 | elapsed time per iteration (ms): 13569.3 | learning rate: 5.641E-06 | global batch size:    16 | lm loss: 7.365005E+00 | loss scale: 16384.0 | grad norm: 64957.880 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1272/  159576 | consumed samples:        20352 | elapsed time per iteration (ms): 13569.6 | learning rate: 5.645E-06 | global batch size:    16 | lm loss: 7.429317E+00 | loss scale: 16384.0 | grad norm: 178872.228 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1273/  159576 | consumed samples:        20368 | elapsed time per iteration (ms): 13472.8 | learning rate: 5.649E-06 | global batch size:    16 | lm loss: 7.312444E+00 | loss scale: 16384.0 | grad norm: 131489.957 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1274/  159576 | consumed samples:        20384 | elapsed time per iteration (ms): 14043.7 | learning rate: 5.654E-06 | global batch size:    16 | lm loss: 7.280907E+00 | loss scale: 16384.0 | grad norm: 80742.529 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1275/  159576 | consumed samples:        20400 | elapsed time per iteration (ms): 13515.6 | learning rate: 5.658E-06 | global batch size:    16 | lm loss: 7.473969E+00 | loss scale: 16384.0 | grad norm: 192617.575 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1276/  159576 | consumed samples:        20416 | elapsed time per iteration (ms): 13555.1 | learning rate: 5.663E-06 | global batch size:    16 | lm loss: 7.571683E+00 | loss scale: 16384.0 | grad norm: 142231.827 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1277/  159576 | consumed samples:        20432 | elapsed time per iteration (ms): 13684.0 | learning rate: 5.667E-06 | global batch size:    16 | lm loss: 7.370350E+00 | loss scale: 16384.0 | grad norm: 91290.772 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1278/  159576 | consumed samples:        20448 | elapsed time per iteration (ms): 14108.9 | learning rate: 5.672E-06 | global batch size:    16 | lm loss: 7.258504E+00 | loss scale: 16384.0 | grad norm: 111985.269 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1279/  159576 | consumed samples:        20464 | elapsed time per iteration (ms): 13599.8 | learning rate: 5.676E-06 | global batch size:    16 | lm loss: 7.378584E+00 | loss scale: 16384.0 | grad norm: 101238.659 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1280/  159576 | consumed samples:        20480 | elapsed time per iteration (ms): 13689.3 | learning rate: 5.680E-06 | global batch size:    16 | lm loss: 7.344358E+00 | loss scale: 16384.0 | grad norm: 131175.820 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1281/  159576 | consumed samples:        20496 | elapsed time per iteration (ms): 13675.0 | learning rate: 5.685E-06 | global batch size:    16 | lm loss: 7.253249E+00 | loss scale: 16384.0 | grad norm: 81245.877 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1282/  159576 | consumed samples:        20512 | elapsed time per iteration (ms): 13723.8 | learning rate: 5.689E-06 | global batch size:    16 | lm loss: 7.385771E+00 | loss scale: 16384.0 | grad norm: 80281.812 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1283/  159576 | consumed samples:        20528 | elapsed time per iteration (ms): 13839.8 | learning rate: 5.694E-06 | global batch size:    16 | lm loss: 7.253633E+00 | loss scale: 16384.0 | grad norm: 106168.685 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1284/  159576 | consumed samples:        20544 | elapsed time per iteration (ms): 13645.0 | learning rate: 5.698E-06 | global batch size:    16 | lm loss: 7.091393E+00 | loss scale: 16384.0 | grad norm: 119249.818 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1285/  159576 | consumed samples:        20560 | elapsed time per iteration (ms): 13673.3 | learning rate: 5.703E-06 | global batch size:    16 | lm loss: 7.346157E+00 | loss scale: 16384.0 | grad norm: 87118.195 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1286/  159576 | consumed samples:        20576 | elapsed time per iteration (ms): 13680.7 | learning rate: 5.707E-06 | global batch size:    16 | lm loss: 7.301017E+00 | loss scale: 16384.0 | grad norm: 66813.094 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1287/  159576 | consumed samples:        20592 | elapsed time per iteration (ms): 14107.0 | learning rate: 5.712E-06 | global batch size:    16 | lm loss: 7.228415E+00 | loss scale: 16384.0 | grad norm: 90274.274 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1288/  159576 | consumed samples:        20608 | elapsed time per iteration (ms): 13593.6 | learning rate: 5.716E-06 | global batch size:    16 | lm loss: 7.412420E+00 | loss scale: 16384.0 | grad norm: 74854.970 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1289/  159576 | consumed samples:        20624 | elapsed time per iteration (ms): 13657.4 | learning rate: 5.720E-06 | global batch size:    16 | lm loss: 7.296477E+00 | loss scale: 16384.0 | grad norm: 78756.807 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1290/  159576 | consumed samples:        20640 | elapsed time per iteration (ms): 13628.7 | learning rate: 5.725E-06 | global batch size:    16 | lm loss: 7.091270E+00 | loss scale: 16384.0 | grad norm: 77550.258 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1291/  159576 | consumed samples:        20656 | elapsed time per iteration (ms): 13654.9 | learning rate: 5.729E-06 | global batch size:    16 | lm loss: 7.247941E+00 | loss scale: 16384.0 | grad norm: 140565.268 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1292/  159576 | consumed samples:        20672 | elapsed time per iteration (ms): 13789.5 | learning rate: 5.734E-06 | global batch size:    16 | lm loss: 7.326149E+00 | loss scale: 16384.0 | grad norm: 66170.421 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1293/  159576 | consumed samples:        20688 | elapsed time per iteration (ms): 13629.3 | learning rate: 5.738E-06 | global batch size:    16 | lm loss: 7.358797E+00 | loss scale: 16384.0 | grad norm: 94692.189 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1294/  159576 | consumed samples:        20704 | elapsed time per iteration (ms): 13584.0 | learning rate: 5.743E-06 | global batch size:    16 | lm loss: 7.254357E+00 | loss scale: 16384.0 | grad norm: 69169.193 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1295/  159576 | consumed samples:        20720 | elapsed time per iteration (ms): 13612.6 | learning rate: 5.747E-06 | global batch size:    16 | lm loss: 7.449785E+00 | loss scale: 16384.0 | grad norm: 180039.609 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1296/  159576 | consumed samples:        20736 | elapsed time per iteration (ms): 13948.4 | learning rate: 5.751E-06 | global batch size:    16 | lm loss: 7.506041E+00 | loss scale: 16384.0 | grad norm: 147606.074 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1297/  159576 | consumed samples:        20752 | elapsed time per iteration (ms): 13604.2 | learning rate: 5.756E-06 | global batch size:    16 | lm loss: 7.265352E+00 | loss scale: 16384.0 | grad norm: 87511.848 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1298/  159576 | consumed samples:        20768 | elapsed time per iteration (ms): 13622.0 | learning rate: 5.760E-06 | global batch size:    16 | lm loss: 7.446327E+00 | loss scale: 16384.0 | grad norm: 91155.668 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1299/  159576 | consumed samples:        20784 | elapsed time per iteration (ms): 13674.5 | learning rate: 5.765E-06 | global batch size:    16 | lm loss: 7.469901E+00 | loss scale: 16384.0 | grad norm: 219048.196 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1300/  159576 | consumed samples:        20800 | elapsed time per iteration (ms): 13848.4 | learning rate: 5.769E-06 | global batch size:    16 | lm loss: 7.389014E+00 | loss scale: 16384.0 | grad norm: 84402.094 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1301/  159576 | consumed samples:        20816 | elapsed time per iteration (ms): 13625.0 | learning rate: 5.774E-06 | global batch size:    16 | lm loss: 7.303530E+00 | loss scale: 16384.0 | grad norm: 174901.504 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1302/  159576 | consumed samples:        20832 | elapsed time per iteration (ms): 13624.5 | learning rate: 5.778E-06 | global batch size:    16 | lm loss: 7.358258E+00 | loss scale: 16384.0 | grad norm: 146018.382 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1303/  159576 | consumed samples:        20848 | elapsed time per iteration (ms): 13602.8 | learning rate: 5.783E-06 | global batch size:    16 | lm loss: 7.337800E+00 | loss scale: 16384.0 | grad norm: 109327.316 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1304/  159576 | consumed samples:        20864 | elapsed time per iteration (ms): 13628.1 | learning rate: 5.787E-06 | global batch size:    16 | lm loss: 7.310088E+00 | loss scale: 16384.0 | grad norm: 83547.733 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1305/  159576 | consumed samples:        20880 | elapsed time per iteration (ms): 13754.8 | learning rate: 5.791E-06 | global batch size:    16 | lm loss: 7.464965E+00 | loss scale: 16384.0 | grad norm: 695515.315 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1306/  159576 | consumed samples:        20896 | elapsed time per iteration (ms): 13652.7 | learning rate: 5.796E-06 | global batch size:    16 | lm loss: 7.764376E+00 | loss scale: 16384.0 | grad norm: 569876.871 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1307/  159576 | consumed samples:        20912 | elapsed time per iteration (ms): 13609.0 | learning rate: 5.800E-06 | global batch size:    16 | lm loss: 7.550226E+00 | loss scale: 16384.0 | grad norm: 356748.186 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1308/  159576 | consumed samples:        20928 | elapsed time per iteration (ms): 13602.6 | learning rate: 5.805E-06 | global batch size:    16 | lm loss: 7.402792E+00 | loss scale: 16384.0 | grad norm: 159267.929 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1309/  159576 | consumed samples:        20944 | elapsed time per iteration (ms): 13968.8 | learning rate: 5.809E-06 | global batch size:    16 | lm loss: 7.204682E+00 | loss scale: 16384.0 | grad norm: 129995.340 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1310/  159576 | consumed samples:        20960 | elapsed time per iteration (ms): 13646.5 | learning rate: 5.814E-06 | global batch size:    16 | lm loss: 7.591084E+00 | loss scale: 16384.0 | grad norm: 143380.550 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1311/  159576 | consumed samples:        20976 | elapsed time per iteration (ms): 13595.1 | learning rate: 5.818E-06 | global batch size:    16 | lm loss: 7.316426E+00 | loss scale: 16384.0 | grad norm: 150593.992 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1312/  159576 | consumed samples:        20992 | elapsed time per iteration (ms): 13595.5 | learning rate: 5.822E-06 | global batch size:    16 | lm loss: 7.305964E+00 | loss scale: 16384.0 | grad norm: 177049.360 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1313/  159576 | consumed samples:        21008 | elapsed time per iteration (ms): 13979.9 | learning rate: 5.827E-06 | global batch size:    16 | lm loss: 7.567747E+00 | loss scale: 16384.0 | grad norm: 169809.702 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1314/  159576 | consumed samples:        21024 | elapsed time per iteration (ms): 13640.7 | learning rate: 5.831E-06 | global batch size:    16 | lm loss: 7.395080E+00 | loss scale: 16384.0 | grad norm: 145564.791 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1315/  159576 | consumed samples:        21040 | elapsed time per iteration (ms): 13592.0 | learning rate: 5.836E-06 | global batch size:    16 | lm loss: 7.317047E+00 | loss scale: 16384.0 | grad norm: 104694.703 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1316/  159576 | consumed samples:        21056 | elapsed time per iteration (ms): 13586.9 | learning rate: 5.840E-06 | global batch size:    16 | lm loss: 7.255484E+00 | loss scale: 16384.0 | grad norm: 93976.240 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1317/  159576 | consumed samples:        21072 | elapsed time per iteration (ms): 13589.9 | learning rate: 5.845E-06 | global batch size:    16 | lm loss: 7.440733E+00 | loss scale: 16384.0 | grad norm: 181969.447 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1318/  159576 | consumed samples:        21088 | elapsed time per iteration (ms): 13777.5 | learning rate: 5.849E-06 | global batch size:    16 | lm loss: 7.425194E+00 | loss scale: 16384.0 | grad norm: 109784.173 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1319/  159576 | consumed samples:        21104 | elapsed time per iteration (ms): 13622.9 | learning rate: 5.854E-06 | global batch size:    16 | lm loss: 7.338997E+00 | loss scale: 16384.0 | grad norm: 146618.704 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1320/  159576 | consumed samples:        21120 | elapsed time per iteration (ms): 13655.9 | learning rate: 5.858E-06 | global batch size:    16 | lm loss: 7.517268E+00 | loss scale: 16384.0 | grad norm: 108508.882 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1321/  159576 | consumed samples:        21136 | elapsed time per iteration (ms): 13535.6 | learning rate: 5.862E-06 | global batch size:    16 | lm loss: 7.358712E+00 | loss scale: 16384.0 | grad norm: 100699.582 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1322/  159576 | consumed samples:        21152 | elapsed time per iteration (ms): 13935.1 | learning rate: 5.867E-06 | global batch size:    16 | lm loss: 7.184452E+00 | loss scale: 16384.0 | grad norm: 85896.066 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1323/  159576 | consumed samples:        21168 | elapsed time per iteration (ms): 13612.2 | learning rate: 5.871E-06 | global batch size:    16 | lm loss: 7.388680E+00 | loss scale: 16384.0 | grad norm: 283765.557 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1324/  159576 | consumed samples:        21184 | elapsed time per iteration (ms): 13600.2 | learning rate: 5.876E-06 | global batch size:    16 | lm loss: 7.594103E+00 | loss scale: 16384.0 | grad norm: 191758.573 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1325/  159576 | consumed samples:        21200 | elapsed time per iteration (ms): 13592.0 | learning rate: 5.880E-06 | global batch size:    16 | lm loss: 7.443296E+00 | loss scale: 16384.0 | grad norm: 112255.550 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1326/  159576 | consumed samples:        21216 | elapsed time per iteration (ms): 13594.2 | learning rate: 5.885E-06 | global batch size:    16 | lm loss: 7.192332E+00 | loss scale: 16384.0 | grad norm: 110320.623 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1327/  159576 | consumed samples:        21232 | elapsed time per iteration (ms): 13762.8 | learning rate: 5.889E-06 | global batch size:    16 | lm loss: 8.096416E+00 | loss scale: 16384.0 | grad norm: 131448.164 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1328/  159576 | consumed samples:        21248 | elapsed time per iteration (ms): 13579.8 | learning rate: 5.893E-06 | global batch size:    16 | lm loss: 7.433802E+00 | loss scale: 16384.0 | grad norm: 182837.970 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1329/  159576 | consumed samples:        21264 | elapsed time per iteration (ms): 13581.7 | learning rate: 5.898E-06 | global batch size:    16 | lm loss: 7.172110E+00 | loss scale: 16384.0 | grad norm: 100348.173 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1330/  159576 | consumed samples:        21280 | elapsed time per iteration (ms): 13583.6 | learning rate: 5.902E-06 | global batch size:    16 | lm loss: 7.240623E+00 | loss scale: 16384.0 | grad norm: 100150.341 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1331/  159576 | consumed samples:        21296 | elapsed time per iteration (ms): 14102.4 | learning rate: 5.907E-06 | global batch size:    16 | lm loss: 7.203824E+00 | loss scale: 16384.0 | grad norm: 241560.384 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1332/  159576 | consumed samples:        21312 | elapsed time per iteration (ms): 13644.3 | learning rate: 5.911E-06 | global batch size:    16 | lm loss: 7.245723E+00 | loss scale: 16384.0 | grad norm: 129411.280 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1333/  159576 | consumed samples:        21328 | elapsed time per iteration (ms): 13656.6 | learning rate: 5.916E-06 | global batch size:    16 | lm loss: 7.574631E+00 | loss scale: 16384.0 | grad norm: 172987.034 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1334/  159576 | consumed samples:        21344 | elapsed time per iteration (ms): 13588.8 | learning rate: 5.920E-06 | global batch size:    16 | lm loss: 7.287757E+00 | loss scale: 16384.0 | grad norm: 99651.568 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1335/  159576 | consumed samples:        21360 | elapsed time per iteration (ms): 14011.8 | learning rate: 5.925E-06 | global batch size:    16 | lm loss: 7.268057E+00 | loss scale: 16384.0 | grad norm: 109280.402 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1336/  159576 | consumed samples:        21376 | elapsed time per iteration (ms): 13624.4 | learning rate: 5.929E-06 | global batch size:    16 | lm loss: 7.062439E+00 | loss scale: 16384.0 | grad norm: 160438.049 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1337/  159576 | consumed samples:        21392 | elapsed time per iteration (ms): 13544.1 | learning rate: 5.933E-06 | global batch size:    16 | lm loss: 7.233086E+00 | loss scale: 16384.0 | grad norm: 175313.966 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1338/  159576 | consumed samples:        21408 | elapsed time per iteration (ms): 13619.6 | learning rate: 5.938E-06 | global batch size:    16 | lm loss: 7.333053E+00 | loss scale: 16384.0 | grad norm: 104091.148 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1339/  159576 | consumed samples:        21424 | elapsed time per iteration (ms): 13622.4 | learning rate: 5.942E-06 | global batch size:    16 | lm loss: 7.263519E+00 | loss scale: 16384.0 | grad norm: 90175.391 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1340/  159576 | consumed samples:        21440 | elapsed time per iteration (ms): 13736.6 | learning rate: 5.947E-06 | global batch size:    16 | lm loss: 7.445864E+00 | loss scale: 16384.0 | grad norm: 136689.970 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1341/  159576 | consumed samples:        21456 | elapsed time per iteration (ms): 13686.3 | learning rate: 5.951E-06 | global batch size:    16 | lm loss: 7.362231E+00 | loss scale: 16384.0 | grad norm: 184602.422 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1342/  159576 | consumed samples:        21472 | elapsed time per iteration (ms): 13488.8 | learning rate: 5.956E-06 | global batch size:    16 | lm loss: 7.368071E+00 | loss scale: 16384.0 | grad norm: 82633.413 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1343/  159576 | consumed samples:        21488 | elapsed time per iteration (ms): 13605.8 | learning rate: 5.960E-06 | global batch size:    16 | lm loss: 7.327272E+00 | loss scale: 16384.0 | grad norm: 92741.507 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1344/  159576 | consumed samples:        21504 | elapsed time per iteration (ms): 14069.0 | learning rate: 5.964E-06 | global batch size:    16 | lm loss: 7.323634E+00 | loss scale: 16384.0 | grad norm: 99780.106 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1345/  159576 | consumed samples:        21520 | elapsed time per iteration (ms): 13450.7 | learning rate: 5.969E-06 | global batch size:    16 | lm loss: 7.741362E+00 | loss scale: 16384.0 | grad norm: 105396.793 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1346/  159576 | consumed samples:        21536 | elapsed time per iteration (ms): 13598.3 | learning rate: 5.973E-06 | global batch size:    16 | lm loss: 7.280247E+00 | loss scale: 16384.0 | grad norm: 77724.692 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1347/  159576 | consumed samples:        21552 | elapsed time per iteration (ms): 13585.6 | learning rate: 5.978E-06 | global batch size:    16 | lm loss: 7.398378E+00 | loss scale: 16384.0 | grad norm: 69954.709 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1348/  159576 | consumed samples:        21568 | elapsed time per iteration (ms): 13610.3 | learning rate: 5.982E-06 | global batch size:    16 | lm loss: 7.321609E+00 | loss scale: 16384.0 | grad norm: 94086.734 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1349/  159576 | consumed samples:        21584 | elapsed time per iteration (ms): 13777.1 | learning rate: 5.987E-06 | global batch size:    16 | lm loss: 7.188628E+00 | loss scale: 16384.0 | grad norm: 81475.279 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1350/  159576 | consumed samples:        21600 | elapsed time per iteration (ms): 13566.9 | learning rate: 5.991E-06 | global batch size:    16 | lm loss: 7.515175E+00 | loss scale: 16384.0 | grad norm: 78780.993 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1351/  159576 | consumed samples:        21616 | elapsed time per iteration (ms): 13622.9 | learning rate: 5.996E-06 | global batch size:    16 | lm loss: 7.231083E+00 | loss scale: 16384.0 | grad norm: 86153.703 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1352/  159576 | consumed samples:        21632 | elapsed time per iteration (ms): 13562.3 | learning rate: 6.000E-06 | global batch size:    16 | lm loss: 7.206710E+00 | loss scale: 16384.0 | grad norm: 83949.216 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1353/  159576 | consumed samples:        21648 | elapsed time per iteration (ms): 13968.8 | learning rate: 6.004E-06 | global batch size:    16 | lm loss: 7.293135E+00 | loss scale: 16384.0 | grad norm: 83956.626 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1354/  159576 | consumed samples:        21664 | elapsed time per iteration (ms): 13680.7 | learning rate: 6.009E-06 | global batch size:    16 | lm loss: 7.282973E+00 | loss scale: 16384.0 | grad norm: 102770.063 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1355/  159576 | consumed samples:        21680 | elapsed time per iteration (ms): 13601.4 | learning rate: 6.013E-06 | global batch size:    16 | lm loss: 7.427012E+00 | loss scale: 16384.0 | grad norm: 87455.923 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1356/  159576 | consumed samples:        21696 | elapsed time per iteration (ms): 13542.1 | learning rate: 6.018E-06 | global batch size:    16 | lm loss: 7.529208E+00 | loss scale: 16384.0 | grad norm: 83130.183 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1357/  159576 | consumed samples:        21712 | elapsed time per iteration (ms): 13961.0 | learning rate: 6.022E-06 | global batch size:    16 | lm loss: 7.327049E+00 | loss scale: 16384.0 | grad norm: 77841.440 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1358/  159576 | consumed samples:        21728 | elapsed time per iteration (ms): 13587.5 | learning rate: 6.027E-06 | global batch size:    16 | lm loss: 7.267120E+00 | loss scale: 16384.0 | grad norm: 86295.759 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1359/  159576 | consumed samples:        21744 | elapsed time per iteration (ms): 13505.9 | learning rate: 6.031E-06 | global batch size:    16 | lm loss: 7.190462E+00 | loss scale: 16384.0 | grad norm: 154865.118 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1360/  159576 | consumed samples:        21760 | elapsed time per iteration (ms): 13616.0 | learning rate: 6.036E-06 | global batch size:    16 | lm loss: 7.321602E+00 | loss scale: 16384.0 | grad norm: 112461.941 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1361/  159576 | consumed samples:        21776 | elapsed time per iteration (ms): 13547.3 | learning rate: 6.040E-06 | global batch size:    16 | lm loss: 7.145373E+00 | loss scale: 16384.0 | grad norm: 72055.762 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1362/  159576 | consumed samples:        21792 | elapsed time per iteration (ms): 13692.3 | learning rate: 6.044E-06 | global batch size:    16 | lm loss: 7.077173E+00 | loss scale: 16384.0 | grad norm: 103896.131 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1363/  159576 | consumed samples:        21808 | elapsed time per iteration (ms): 13612.5 | learning rate: 6.049E-06 | global batch size:    16 | lm loss: 7.245114E+00 | loss scale: 16384.0 | grad norm: 79354.159 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1364/  159576 | consumed samples:        21824 | elapsed time per iteration (ms): 13541.3 | learning rate: 6.053E-06 | global batch size:    16 | lm loss: 7.281060E+00 | loss scale: 16384.0 | grad norm: 148274.049 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1365/  159576 | consumed samples:        21840 | elapsed time per iteration (ms): 13609.2 | learning rate: 6.058E-06 | global batch size:    16 | lm loss: 7.401906E+00 | loss scale: 16384.0 | grad norm: 119123.195 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1366/  159576 | consumed samples:        21856 | elapsed time per iteration (ms): 13916.7 | learning rate: 6.062E-06 | global batch size:    16 | lm loss: 7.338102E+00 | loss scale: 16384.0 | grad norm: 93708.417 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1367/  159576 | consumed samples:        21872 | elapsed time per iteration (ms): 13536.5 | learning rate: 6.067E-06 | global batch size:    16 | lm loss: 7.494397E+00 | loss scale: 16384.0 | grad norm: 130779.852 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1368/  159576 | consumed samples:        21888 | elapsed time per iteration (ms): 13577.1 | learning rate: 6.071E-06 | global batch size:    16 | lm loss: 7.007359E+00 | loss scale: 16384.0 | grad norm: 94271.242 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1369/  159576 | consumed samples:        21904 | elapsed time per iteration (ms): 13571.4 | learning rate: 6.075E-06 | global batch size:    16 | lm loss: 7.129241E+00 | loss scale: 16384.0 | grad norm: 129962.794 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1370/  159576 | consumed samples:        21920 | elapsed time per iteration (ms): 13603.2 | learning rate: 6.080E-06 | global batch size:    16 | lm loss: 7.323318E+00 | loss scale: 16384.0 | grad norm: 138541.774 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1371/  159576 | consumed samples:        21936 | elapsed time per iteration (ms): 13998.6 | learning rate: 6.084E-06 | global batch size:    16 | lm loss: 7.164912E+00 | loss scale: 16384.0 | grad norm: 95366.588 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1372/  159576 | consumed samples:        21952 | elapsed time per iteration (ms): 13587.8 | learning rate: 6.089E-06 | global batch size:    16 | lm loss: 7.207436E+00 | loss scale: 16384.0 | grad norm: 95481.009 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1373/  159576 | consumed samples:        21968 | elapsed time per iteration (ms): 13570.1 | learning rate: 6.093E-06 | global batch size:    16 | lm loss: 7.245305E+00 | loss scale: 16384.0 | grad norm: 110814.337 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1374/  159576 | consumed samples:        21984 | elapsed time per iteration (ms): 13553.5 | learning rate: 6.098E-06 | global batch size:    16 | lm loss: 7.184179E+00 | loss scale: 16384.0 | grad norm: 92107.034 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1375/  159576 | consumed samples:        22000 | elapsed time per iteration (ms): 13994.4 | learning rate: 6.102E-06 | global batch size:    16 | lm loss: 7.117487E+00 | loss scale: 16384.0 | grad norm: 77237.913 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1376/  159576 | consumed samples:        22016 | elapsed time per iteration (ms): 13625.6 | learning rate: 6.107E-06 | global batch size:    16 | lm loss: 7.445632E+00 | loss scale: 16384.0 | grad norm: 139111.184 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1377/  159576 | consumed samples:        22032 | elapsed time per iteration (ms): 13559.3 | learning rate: 6.111E-06 | global batch size:    16 | lm loss: 7.513434E+00 | loss scale: 16384.0 | grad norm: 111307.588 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1378/  159576 | consumed samples:        22048 | elapsed time per iteration (ms): 13608.4 | learning rate: 6.115E-06 | global batch size:    16 | lm loss: 7.255265E+00 | loss scale: 16384.0 | grad norm: 88273.307 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1379/  159576 | consumed samples:        22064 | elapsed time per iteration (ms): 14048.5 | learning rate: 6.120E-06 | global batch size:    16 | lm loss: 7.123577E+00 | loss scale: 16384.0 | grad norm: 85346.614 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1380/  159576 | consumed samples:        22080 | elapsed time per iteration (ms): 13485.1 | learning rate: 6.124E-06 | global batch size:    16 | lm loss: 7.134797E+00 | loss scale: 16384.0 | grad norm: 118284.165 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1381/  159576 | consumed samples:        22096 | elapsed time per iteration (ms): 13616.6 | learning rate: 6.129E-06 | global batch size:    16 | lm loss: 7.281054E+00 | loss scale: 16384.0 | grad norm: 88229.446 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1382/  159576 | consumed samples:        22112 | elapsed time per iteration (ms): 13576.6 | learning rate: 6.133E-06 | global batch size:    16 | lm loss: 7.397271E+00 | loss scale: 16384.0 | grad norm: 130821.847 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1383/  159576 | consumed samples:        22128 | elapsed time per iteration (ms): 13587.8 | learning rate: 6.138E-06 | global batch size:    16 | lm loss: 7.362026E+00 | loss scale: 16384.0 | grad norm: 83450.672 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1384/  159576 | consumed samples:        22144 | elapsed time per iteration (ms): 13848.8 | learning rate: 6.142E-06 | global batch size:    16 | lm loss: 7.275143E+00 | loss scale: 16384.0 | grad norm: 86287.774 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1385/  159576 | consumed samples:        22160 | elapsed time per iteration (ms): 13576.9 | learning rate: 6.146E-06 | global batch size:    16 | lm loss: 7.400926E+00 | loss scale: 16384.0 | grad norm: 98321.914 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1386/  159576 | consumed samples:        22176 | elapsed time per iteration (ms): 13627.2 | learning rate: 6.151E-06 | global batch size:    16 | lm loss: 7.151899E+00 | loss scale: 16384.0 | grad norm: 85060.501 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1387/  159576 | consumed samples:        22192 | elapsed time per iteration (ms): 13519.4 | learning rate: 6.155E-06 | global batch size:    16 | lm loss: 7.335835E+00 | loss scale: 16384.0 | grad norm: 64450.517 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1388/  159576 | consumed samples:        22208 | elapsed time per iteration (ms): 13906.1 | learning rate: 6.160E-06 | global batch size:    16 | lm loss: 7.316273E+00 | loss scale: 16384.0 | grad norm: 66517.199 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1389/  159576 | consumed samples:        22224 | elapsed time per iteration (ms): 13589.2 | learning rate: 6.164E-06 | global batch size:    16 | lm loss: 7.190707E+00 | loss scale: 16384.0 | grad norm: 123710.931 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1390/  159576 | consumed samples:        22240 | elapsed time per iteration (ms): 13545.5 | learning rate: 6.169E-06 | global batch size:    16 | lm loss: 7.337936E+00 | loss scale: 16384.0 | grad norm: 78178.349 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1391/  159576 | consumed samples:        22256 | elapsed time per iteration (ms): 13564.6 | learning rate: 6.173E-06 | global batch size:    16 | lm loss: 7.539785E+00 | loss scale: 16384.0 | grad norm: 111563.102 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1392/  159576 | consumed samples:        22272 | elapsed time per iteration (ms): 13891.4 | learning rate: 6.178E-06 | global batch size:    16 | lm loss: 7.071362E+00 | loss scale: 16384.0 | grad norm: 70647.575 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1393/  159576 | consumed samples:        22288 | elapsed time per iteration (ms): 13681.2 | learning rate: 6.182E-06 | global batch size:    16 | lm loss: 7.133610E+00 | loss scale: 16384.0 | grad norm: 124103.863 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1394/  159576 | consumed samples:        22304 | elapsed time per iteration (ms): 13531.0 | learning rate: 6.186E-06 | global batch size:    16 | lm loss: 7.323411E+00 | loss scale: 16384.0 | grad norm: 99951.813 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1395/  159576 | consumed samples:        22320 | elapsed time per iteration (ms): 13568.0 | learning rate: 6.191E-06 | global batch size:    16 | lm loss: 7.184701E+00 | loss scale: 16384.0 | grad norm: 71905.862 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1396/  159576 | consumed samples:        22336 | elapsed time per iteration (ms): 13541.4 | learning rate: 6.195E-06 | global batch size:    16 | lm loss: 7.166233E+00 | loss scale: 16384.0 | grad norm: 81874.132 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1397/  159576 | consumed samples:        22352 | elapsed time per iteration (ms): 13897.4 | learning rate: 6.200E-06 | global batch size:    16 | lm loss: 7.247505E+00 | loss scale: 16384.0 | grad norm: 84059.366 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1398/  159576 | consumed samples:        22368 | elapsed time per iteration (ms): 13621.5 | learning rate: 6.204E-06 | global batch size:    16 | lm loss: 7.240150E+00 | loss scale: 16384.0 | grad norm: 119489.831 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1399/  159576 | consumed samples:        22384 | elapsed time per iteration (ms): 13579.9 | learning rate: 6.209E-06 | global batch size:    16 | lm loss: 7.294222E+00 | loss scale: 16384.0 | grad norm: 80417.137 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1400/  159576 | consumed samples:        22400 | elapsed time per iteration (ms): 13625.0 | learning rate: 6.213E-06 | global batch size:    16 | lm loss: 7.203695E+00 | loss scale: 16384.0 | grad norm: 97654.667 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1401/  159576 | consumed samples:        22416 | elapsed time per iteration (ms): 14002.5 | learning rate: 6.217E-06 | global batch size:    16 | lm loss: 7.173908E+00 | loss scale: 16384.0 | grad norm: 72597.723 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1402/  159576 | consumed samples:        22432 | elapsed time per iteration (ms): 13559.2 | learning rate: 6.222E-06 | global batch size:    16 | lm loss: 7.213487E+00 | loss scale: 16384.0 | grad norm: 108337.821 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1403/  159576 | consumed samples:        22448 | elapsed time per iteration (ms): 13615.0 | learning rate: 6.226E-06 | global batch size:    16 | lm loss: 7.295056E+00 | loss scale: 16384.0 | grad norm: 109464.933 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1404/  159576 | consumed samples:        22464 | elapsed time per iteration (ms): 13479.3 | learning rate: 6.231E-06 | global batch size:    16 | lm loss: 7.070762E+00 | loss scale: 16384.0 | grad norm: 70008.382 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1405/  159576 | consumed samples:        22480 | elapsed time per iteration (ms): 13573.2 | learning rate: 6.235E-06 | global batch size:    16 | lm loss: 7.206651E+00 | loss scale: 16384.0 | grad norm: 71456.680 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1406/  159576 | consumed samples:        22496 | elapsed time per iteration (ms): 13670.7 | learning rate: 6.240E-06 | global batch size:    16 | lm loss: 7.421339E+00 | loss scale: 16384.0 | grad norm: 81529.039 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1407/  159576 | consumed samples:        22512 | elapsed time per iteration (ms): 13510.9 | learning rate: 6.244E-06 | global batch size:    16 | lm loss: 7.245395E+00 | loss scale: 16384.0 | grad norm: 120780.179 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1408/  159576 | consumed samples:        22528 | elapsed time per iteration (ms): 13544.4 | learning rate: 6.249E-06 | global batch size:    16 | lm loss: 7.479702E+00 | loss scale: 16384.0 | grad norm: 98091.848 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1409/  159576 | consumed samples:        22544 | elapsed time per iteration (ms): 13558.7 | learning rate: 6.253E-06 | global batch size:    16 | lm loss: 7.220355E+00 | loss scale: 16384.0 | grad norm: 71818.367 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1410/  159576 | consumed samples:        22560 | elapsed time per iteration (ms): 13949.7 | learning rate: 6.257E-06 | global batch size:    16 | lm loss: 7.381415E+00 | loss scale: 16384.0 | grad norm: 80168.457 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1411/  159576 | consumed samples:        22576 | elapsed time per iteration (ms): 13573.4 | learning rate: 6.262E-06 | global batch size:    16 | lm loss: 7.330766E+00 | loss scale: 16384.0 | grad norm: 107261.861 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1412/  159576 | consumed samples:        22592 | elapsed time per iteration (ms): 13522.9 | learning rate: 6.266E-06 | global batch size:    16 | lm loss: 7.378265E+00 | loss scale: 16384.0 | grad norm: 115619.714 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1413/  159576 | consumed samples:        22608 | elapsed time per iteration (ms): 13584.4 | learning rate: 6.271E-06 | global batch size:    16 | lm loss: 7.202836E+00 | loss scale: 16384.0 | grad norm: 70230.767 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1414/  159576 | consumed samples:        22624 | elapsed time per iteration (ms): 13797.1 | learning rate: 6.275E-06 | global batch size:    16 | lm loss: 7.202533E+00 | loss scale: 16384.0 | grad norm: 122640.667 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1415/  159576 | consumed samples:        22640 | elapsed time per iteration (ms): 13736.9 | learning rate: 6.280E-06 | global batch size:    16 | lm loss: 7.271989E+00 | loss scale: 16384.0 | grad norm: 80706.550 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1416/  159576 | consumed samples:        22656 | elapsed time per iteration (ms): 13603.3 | learning rate: 6.284E-06 | global batch size:    16 | lm loss: 7.350783E+00 | loss scale: 16384.0 | grad norm: 106402.600 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1417/  159576 | consumed samples:        22672 | elapsed time per iteration (ms): 13663.2 | learning rate: 6.288E-06 | global batch size:    16 | lm loss: 7.629884E+00 | loss scale: 16384.0 | grad norm: 111978.514 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1418/  159576 | consumed samples:        22688 | elapsed time per iteration (ms): 13512.0 | learning rate: 6.293E-06 | global batch size:    16 | lm loss: 7.276966E+00 | loss scale: 16384.0 | grad norm: 86564.098 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1419/  159576 | consumed samples:        22704 | elapsed time per iteration (ms): 13947.9 | learning rate: 6.297E-06 | global batch size:    16 | lm loss: 7.109100E+00 | loss scale: 16384.0 | grad norm: 85621.258 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1420/  159576 | consumed samples:        22720 | elapsed time per iteration (ms): 13554.6 | learning rate: 6.302E-06 | global batch size:    16 | lm loss: 7.234724E+00 | loss scale: 16384.0 | grad norm: 115238.437 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1421/  159576 | consumed samples:        22736 | elapsed time per iteration (ms): 13608.2 | learning rate: 6.306E-06 | global batch size:    16 | lm loss: 7.134557E+00 | loss scale: 16384.0 | grad norm: 127475.605 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1422/  159576 | consumed samples:        22752 | elapsed time per iteration (ms): 13564.6 | learning rate: 6.311E-06 | global batch size:    16 | lm loss: 7.096246E+00 | loss scale: 16384.0 | grad norm: 92678.765 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1423/  159576 | consumed samples:        22768 | elapsed time per iteration (ms): 13993.7 | learning rate: 6.315E-06 | global batch size:    16 | lm loss: 7.215540E+00 | loss scale: 16384.0 | grad norm: 77823.778 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1424/  159576 | consumed samples:        22784 | elapsed time per iteration (ms): 13635.8 | learning rate: 6.320E-06 | global batch size:    16 | lm loss: 7.332169E+00 | loss scale: 16384.0 | grad norm: 88585.736 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1425/  159576 | consumed samples:        22800 | elapsed time per iteration (ms): 13477.0 | learning rate: 6.324E-06 | global batch size:    16 | lm loss: 7.224688E+00 | loss scale: 16384.0 | grad norm: 98593.171 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1426/  159576 | consumed samples:        22816 | elapsed time per iteration (ms): 13579.9 | learning rate: 6.328E-06 | global batch size:    16 | lm loss: 7.330650E+00 | loss scale: 16384.0 | grad norm: 101929.983 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1427/  159576 | consumed samples:        22832 | elapsed time per iteration (ms): 13559.4 | learning rate: 6.333E-06 | global batch size:    16 | lm loss: 7.261027E+00 | loss scale: 16384.0 | grad norm: 79893.479 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1428/  159576 | consumed samples:        22848 | elapsed time per iteration (ms): 13656.6 | learning rate: 6.337E-06 | global batch size:    16 | lm loss: 7.050019E+00 | loss scale: 16384.0 | grad norm: 197668.137 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1429/  159576 | consumed samples:        22864 | elapsed time per iteration (ms): 13549.3 | learning rate: 6.342E-06 | global batch size:    16 | lm loss: 7.283052E+00 | loss scale: 16384.0 | grad norm: 185482.345 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1430/  159576 | consumed samples:        22880 | elapsed time per iteration (ms): 13566.6 | learning rate: 6.346E-06 | global batch size:    16 | lm loss: 7.251038E+00 | loss scale: 16384.0 | grad norm: 81246.801 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1431/  159576 | consumed samples:        22896 | elapsed time per iteration (ms): 13626.6 | learning rate: 6.351E-06 | global batch size:    16 | lm loss: 7.363044E+00 | loss scale: 16384.0 | grad norm: 89555.992 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1432/  159576 | consumed samples:        22912 | elapsed time per iteration (ms): 14023.4 | learning rate: 6.355E-06 | global batch size:    16 | lm loss: 7.350190E+00 | loss scale: 16384.0 | grad norm: 151476.896 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1433/  159576 | consumed samples:        22928 | elapsed time per iteration (ms): 13376.0 | learning rate: 6.359E-06 | global batch size:    16 | lm loss: 7.294331E+00 | loss scale: 16384.0 | grad norm: 148300.162 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1434/  159576 | consumed samples:        22944 | elapsed time per iteration (ms): 13594.6 | learning rate: 6.364E-06 | global batch size:    16 | lm loss: 7.178850E+00 | loss scale: 16384.0 | grad norm: 115814.774 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1435/  159576 | consumed samples:        22960 | elapsed time per iteration (ms): 13589.5 | learning rate: 6.368E-06 | global batch size:    16 | lm loss: 7.174537E+00 | loss scale: 16384.0 | grad norm: 89057.264 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1436/  159576 | consumed samples:        22976 | elapsed time per iteration (ms): 13854.5 | learning rate: 6.373E-06 | global batch size:    16 | lm loss: 7.455090E+00 | loss scale: 16384.0 | grad norm: 143357.692 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1437/  159576 | consumed samples:        22992 | elapsed time per iteration (ms): 13800.5 | learning rate: 6.377E-06 | global batch size:    16 | lm loss: 7.230480E+00 | loss scale: 16384.0 | grad norm: 124647.889 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1438/  159576 | consumed samples:        23008 | elapsed time per iteration (ms): 13574.3 | learning rate: 6.382E-06 | global batch size:    16 | lm loss: 7.214196E+00 | loss scale: 16384.0 | grad norm: 90534.924 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1439/  159576 | consumed samples:        23024 | elapsed time per iteration (ms): 13559.7 | learning rate: 6.386E-06 | global batch size:    16 | lm loss: 7.228687E+00 | loss scale: 16384.0 | grad norm: 100823.134 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1440/  159576 | consumed samples:        23040 | elapsed time per iteration (ms): 13580.1 | learning rate: 6.391E-06 | global batch size:    16 | lm loss: 7.297411E+00 | loss scale: 16384.0 | grad norm: 72207.799 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1441/  159576 | consumed samples:        23056 | elapsed time per iteration (ms): 13763.6 | learning rate: 6.395E-06 | global batch size:    16 | lm loss: 7.403437E+00 | loss scale: 16384.0 | grad norm: 227400.170 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1442/  159576 | consumed samples:        23072 | elapsed time per iteration (ms): 13606.0 | learning rate: 6.399E-06 | global batch size:    16 | lm loss: 7.267770E+00 | loss scale: 16384.0 | grad norm: 178424.275 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1443/  159576 | consumed samples:        23088 | elapsed time per iteration (ms): 13579.5 | learning rate: 6.404E-06 | global batch size:    16 | lm loss: 7.196310E+00 | loss scale: 16384.0 | grad norm: 93737.230 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1444/  159576 | consumed samples:        23104 | elapsed time per iteration (ms): 13564.8 | learning rate: 6.408E-06 | global batch size:    16 | lm loss: 7.180475E+00 | loss scale: 16384.0 | grad norm: 107567.132 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1445/  159576 | consumed samples:        23120 | elapsed time per iteration (ms): 14086.1 | learning rate: 6.413E-06 | global batch size:    16 | lm loss: 7.235699E+00 | loss scale: 16384.0 | grad norm: 90017.706 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1446/  159576 | consumed samples:        23136 | elapsed time per iteration (ms): 13420.4 | learning rate: 6.417E-06 | global batch size:    16 | lm loss: 7.131771E+00 | loss scale: 16384.0 | grad norm: 200715.783 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1447/  159576 | consumed samples:        23152 | elapsed time per iteration (ms): 13582.8 | learning rate: 6.422E-06 | global batch size:    16 | lm loss: 7.147336E+00 | loss scale: 16384.0 | grad norm: 139041.379 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1448/  159576 | consumed samples:        23168 | elapsed time per iteration (ms): 13591.5 | learning rate: 6.426E-06 | global batch size:    16 | lm loss: 7.223548E+00 | loss scale: 16384.0 | grad norm: 81314.906 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1449/  159576 | consumed samples:        23184 | elapsed time per iteration (ms): 13543.2 | learning rate: 6.430E-06 | global batch size:    16 | lm loss: 7.126436E+00 | loss scale: 16384.0 | grad norm: 104656.231 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1450/  159576 | consumed samples:        23200 | elapsed time per iteration (ms): 13771.0 | learning rate: 6.435E-06 | global batch size:    16 | lm loss: 7.239769E+00 | loss scale: 16384.0 | grad norm: 55782.887 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1451/  159576 | consumed samples:        23216 | elapsed time per iteration (ms): 13581.7 | learning rate: 6.439E-06 | global batch size:    16 | lm loss: 7.431156E+00 | loss scale: 16384.0 | grad norm: 265376.495 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1452/  159576 | consumed samples:        23232 | elapsed time per iteration (ms): 13633.4 | learning rate: 6.444E-06 | global batch size:    16 | lm loss: 7.120412E+00 | loss scale: 16384.0 | grad norm: 153821.211 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1453/  159576 | consumed samples:        23248 | elapsed time per iteration (ms): 13510.9 | learning rate: 6.448E-06 | global batch size:    16 | lm loss: 7.361814E+00 | loss scale: 16384.0 | grad norm: 91484.610 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1454/  159576 | consumed samples:        23264 | elapsed time per iteration (ms): 14008.9 | learning rate: 6.453E-06 | global batch size:    16 | lm loss: 7.429213E+00 | loss scale: 16384.0 | grad norm: 95193.402 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1455/  159576 | consumed samples:        23280 | elapsed time per iteration (ms): 13534.7 | learning rate: 6.457E-06 | global batch size:    16 | lm loss: 7.311771E+00 | loss scale: 16384.0 | grad norm: 99688.210 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1456/  159576 | consumed samples:        23296 | elapsed time per iteration (ms): 13570.9 | learning rate: 6.462E-06 | global batch size:    16 | lm loss: 7.326795E+00 | loss scale: 16384.0 | grad norm: 199002.918 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1457/  159576 | consumed samples:        23312 | elapsed time per iteration (ms): 13567.6 | learning rate: 6.466E-06 | global batch size:    16 | lm loss: 7.238305E+00 | loss scale: 16384.0 | grad norm: 148524.516 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1458/  159576 | consumed samples:        23328 | elapsed time per iteration (ms): 14002.9 | learning rate: 6.470E-06 | global batch size:    16 | lm loss: 7.170752E+00 | loss scale: 16384.0 | grad norm: 83892.787 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1459/  159576 | consumed samples:        23344 | elapsed time per iteration (ms): 13758.9 | learning rate: 6.475E-06 | global batch size:    16 | lm loss: 7.148302E+00 | loss scale: 16384.0 | grad norm: 92326.384 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1460/  159576 | consumed samples:        23360 | elapsed time per iteration (ms): 13596.9 | learning rate: 6.479E-06 | global batch size:    16 | lm loss: 7.386099E+00 | loss scale: 16384.0 | grad norm: 141912.785 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1461/  159576 | consumed samples:        23376 | elapsed time per iteration (ms): 13627.4 | learning rate: 6.484E-06 | global batch size:    16 | lm loss: 7.288848E+00 | loss scale: 16384.0 | grad norm: 170265.777 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1462/  159576 | consumed samples:        23392 | elapsed time per iteration (ms): 13618.4 | learning rate: 6.488E-06 | global batch size:    16 | lm loss: 7.229756E+00 | loss scale: 16384.0 | grad norm: 120999.804 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1463/  159576 | consumed samples:        23408 | elapsed time per iteration (ms): 13656.7 | learning rate: 6.493E-06 | global batch size:    16 | lm loss: 7.281564E+00 | loss scale: 16384.0 | grad norm: 93039.502 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1464/  159576 | consumed samples:        23424 | elapsed time per iteration (ms): 13645.1 | learning rate: 6.497E-06 | global batch size:    16 | lm loss: 7.287534E+00 | loss scale: 16384.0 | grad norm: 80620.713 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1465/  159576 | consumed samples:        23440 | elapsed time per iteration (ms): 13567.3 | learning rate: 6.501E-06 | global batch size:    16 | lm loss: 7.328496E+00 | loss scale: 16384.0 | grad norm: 125622.289 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1466/  159576 | consumed samples:        23456 | elapsed time per iteration (ms): 13597.3 | learning rate: 6.506E-06 | global batch size:    16 | lm loss: 7.289563E+00 | loss scale: 16384.0 | grad norm: 115928.663 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1467/  159576 | consumed samples:        23472 | elapsed time per iteration (ms): 13941.8 | learning rate: 6.510E-06 | global batch size:    16 | lm loss: 7.383677E+00 | loss scale: 16384.0 | grad norm: 88787.769 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1468/  159576 | consumed samples:        23488 | elapsed time per iteration (ms): 13557.9 | learning rate: 6.515E-06 | global batch size:    16 | lm loss: 7.200576E+00 | loss scale: 16384.0 | grad norm: 72136.963 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1469/  159576 | consumed samples:        23504 | elapsed time per iteration (ms): 13659.8 | learning rate: 6.519E-06 | global batch size:    16 | lm loss: 7.237146E+00 | loss scale: 16384.0 | grad norm: 80384.892 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1470/  159576 | consumed samples:        23520 | elapsed time per iteration (ms): 13520.5 | learning rate: 6.524E-06 | global batch size:    16 | lm loss: 7.087498E+00 | loss scale: 16384.0 | grad norm: 84910.064 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1471/  159576 | consumed samples:        23536 | elapsed time per iteration (ms): 13587.4 | learning rate: 6.528E-06 | global batch size:    16 | lm loss: 7.201303E+00 | loss scale: 16384.0 | grad norm: 82344.270 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1472/  159576 | consumed samples:        23552 | elapsed time per iteration (ms): 13785.3 | learning rate: 6.533E-06 | global batch size:    16 | lm loss: 7.099293E+00 | loss scale: 16384.0 | grad norm: 90694.938 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1473/  159576 | consumed samples:        23568 | elapsed time per iteration (ms): 13564.5 | learning rate: 6.537E-06 | global batch size:    16 | lm loss: 7.241871E+00 | loss scale: 16384.0 | grad norm: 49829.478 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1474/  159576 | consumed samples:        23584 | elapsed time per iteration (ms): 13624.0 | learning rate: 6.541E-06 | global batch size:    16 | lm loss: 7.157920E+00 | loss scale: 16384.0 | grad norm: 134064.505 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1475/  159576 | consumed samples:        23600 | elapsed time per iteration (ms): 13651.2 | learning rate: 6.546E-06 | global batch size:    16 | lm loss: 7.214240E+00 | loss scale: 16384.0 | grad norm: 86872.151 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1476/  159576 | consumed samples:        23616 | elapsed time per iteration (ms): 14166.8 | learning rate: 6.550E-06 | global batch size:    16 | lm loss: 7.192460E+00 | loss scale: 16384.0 | grad norm: 80848.938 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1477/  159576 | consumed samples:        23632 | elapsed time per iteration (ms): 13604.7 | learning rate: 6.555E-06 | global batch size:    16 | lm loss: 7.323776E+00 | loss scale: 16384.0 | grad norm: 70454.418 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1478/  159576 | consumed samples:        23648 | elapsed time per iteration (ms): 13572.6 | learning rate: 6.559E-06 | global batch size:    16 | lm loss: 7.268590E+00 | loss scale: 16384.0 | grad norm: 71693.339 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1479/  159576 | consumed samples:        23664 | elapsed time per iteration (ms): 13608.6 | learning rate: 6.564E-06 | global batch size:    16 | lm loss: 7.296487E+00 | loss scale: 16384.0 | grad norm: 81654.087 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1480/  159576 | consumed samples:        23680 | elapsed time per iteration (ms): 14039.7 | learning rate: 6.568E-06 | global batch size:    16 | lm loss: 7.090362E+00 | loss scale: 16384.0 | grad norm: 64201.153 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1481/  159576 | consumed samples:        23696 | elapsed time per iteration (ms): 13583.2 | learning rate: 6.572E-06 | global batch size:    16 | lm loss: 7.375229E+00 | loss scale: 16384.0 | grad norm: 113007.126 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1482/  159576 | consumed samples:        23712 | elapsed time per iteration (ms): 13660.9 | learning rate: 6.577E-06 | global batch size:    16 | lm loss: 7.293176E+00 | loss scale: 16384.0 | grad norm: 77498.464 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1483/  159576 | consumed samples:        23728 | elapsed time per iteration (ms): 13614.0 | learning rate: 6.581E-06 | global batch size:    16 | lm loss: 7.336072E+00 | loss scale: 16384.0 | grad norm: 110912.409 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1484/  159576 | consumed samples:        23744 | elapsed time per iteration (ms): 13566.7 | learning rate: 6.586E-06 | global batch size:    16 | lm loss: 7.364174E+00 | loss scale: 16384.0 | grad norm: 183688.896 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1485/  159576 | consumed samples:        23760 | elapsed time per iteration (ms): 13815.4 | learning rate: 6.590E-06 | global batch size:    16 | lm loss: 7.239150E+00 | loss scale: 16384.0 | grad norm: 72249.353 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1486/  159576 | consumed samples:        23776 | elapsed time per iteration (ms): 13589.6 | learning rate: 6.595E-06 | global batch size:    16 | lm loss: 7.200100E+00 | loss scale: 16384.0 | grad norm: 96228.791 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1487/  159576 | consumed samples:        23792 | elapsed time per iteration (ms): 13607.7 | learning rate: 6.599E-06 | global batch size:    16 | lm loss: 7.292061E+00 | loss scale: 16384.0 | grad norm: 121424.509 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1488/  159576 | consumed samples:        23808 | elapsed time per iteration (ms): 13632.1 | learning rate: 6.604E-06 | global batch size:    16 | lm loss: 7.136326E+00 | loss scale: 16384.0 | grad norm: 126581.190 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1489/  159576 | consumed samples:        23824 | elapsed time per iteration (ms): 14024.4 | learning rate: 6.608E-06 | global batch size:    16 | lm loss: 7.314082E+00 | loss scale: 16384.0 | grad norm: 81672.303 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1490/  159576 | consumed samples:        23840 | elapsed time per iteration (ms): 13562.3 | learning rate: 6.612E-06 | global batch size:    16 | lm loss: 7.220848E+00 | loss scale: 16384.0 | grad norm: 124864.436 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1491/  159576 | consumed samples:        23856 | elapsed time per iteration (ms): 13573.1 | learning rate: 6.617E-06 | global batch size:    16 | lm loss: 7.139018E+00 | loss scale: 16384.0 | grad norm: 91430.675 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1492/  159576 | consumed samples:        23872 | elapsed time per iteration (ms): 13614.3 | learning rate: 6.621E-06 | global batch size:    16 | lm loss: 7.268013E+00 | loss scale: 16384.0 | grad norm: 135716.036 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1493/  159576 | consumed samples:        23888 | elapsed time per iteration (ms): 13616.6 | learning rate: 6.626E-06 | global batch size:    16 | lm loss: 7.252588E+00 | loss scale: 16384.0 | grad norm: 83740.306 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1494/  159576 | consumed samples:        23904 | elapsed time per iteration (ms): 13959.7 | learning rate: 6.630E-06 | global batch size:    16 | lm loss: 6.975100E+00 | loss scale: 16384.0 | grad norm: 83284.334 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1495/  159576 | consumed samples:        23920 | elapsed time per iteration (ms): 13605.9 | learning rate: 6.635E-06 | global batch size:    16 | lm loss: 7.372656E+00 | loss scale: 16384.0 | grad norm: 69225.313 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1496/  159576 | consumed samples:        23936 | elapsed time per iteration (ms): 13623.3 | learning rate: 6.639E-06 | global batch size:    16 | lm loss: 7.219198E+00 | loss scale: 16384.0 | grad norm: 115429.784 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1497/  159576 | consumed samples:        23952 | elapsed time per iteration (ms): 13627.9 | learning rate: 6.643E-06 | global batch size:    16 | lm loss: 7.340521E+00 | loss scale: 16384.0 | grad norm: 85290.242 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1498/  159576 | consumed samples:        23968 | elapsed time per iteration (ms): 13884.3 | learning rate: 6.648E-06 | global batch size:    16 | lm loss: 7.186238E+00 | loss scale: 16384.0 | grad norm: 114903.673 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1499/  159576 | consumed samples:        23984 | elapsed time per iteration (ms): 13657.7 | learning rate: 6.652E-06 | global batch size:    16 | lm loss: 7.208917E+00 | loss scale: 16384.0 | grad norm: 92023.782 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1500/  159576 | consumed samples:        24000 | elapsed time per iteration (ms): 13609.5 | learning rate: 6.657E-06 | global batch size:    16 | lm loss: 7.191697E+00 | loss scale: 32768.0 | grad norm: 96438.628 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
saving checkpoint at iteration    1500 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints
[2021-09-24 08:02:02,260] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/global_step1500/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration    1500 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints
time (ms) | save-checkpoint: 17270.67
 iteration     1501/  159576 | consumed samples:        24016 | elapsed time per iteration (ms): 31402.6 | learning rate: 6.661E-06 | global batch size:    16 | lm loss: 7.276592E+00 | loss scale: 32768.0 | grad norm: 161966.188 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1502/  159576 | consumed samples:        24032 | elapsed time per iteration (ms): 13594.3 | learning rate: 6.666E-06 | global batch size:    16 | lm loss: 7.318794E+00 | loss scale: 32768.0 | grad norm: 194567.067 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1503/  159576 | consumed samples:        24048 | elapsed time per iteration (ms): 13587.2 | learning rate: 6.670E-06 | global batch size:    16 | lm loss: 7.168730E+00 | loss scale: 32768.0 | grad norm: 147316.587 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1504/  159576 | consumed samples:        24064 | elapsed time per iteration (ms): 13690.3 | learning rate: 6.675E-06 | global batch size:    16 | lm loss: 7.199265E+00 | loss scale: 32768.0 | grad norm: 160502.917 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1505/  159576 | consumed samples:        24080 | elapsed time per iteration (ms): 14065.5 | learning rate: 6.679E-06 | global batch size:    16 | lm loss: 7.004994E+00 | loss scale: 32768.0 | grad norm: 126147.516 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1506/  159576 | consumed samples:        24096 | elapsed time per iteration (ms): 13542.1 | learning rate: 6.683E-06 | global batch size:    16 | lm loss: 7.322471E+00 | loss scale: 32768.0 | grad norm: 196683.898 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1507/  159576 | consumed samples:        24112 | elapsed time per iteration (ms): 13669.0 | learning rate: 6.688E-06 | global batch size:    16 | lm loss: 7.393982E+00 | loss scale: 32768.0 | grad norm: 190898.758 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-24 08:03:56] PULSE: tr8-104B is waiting for the previous Job Array job to finish before scheduling a new one (1162855_[2-10%1] on 'gpu_p13' partition)
[2021-09-24 08:03:56] PULSE: tr8-104B is running for 2:11:45 since 2021-09-24T05:52:11 (1162855_1 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8])
 iteration     1508/  159576 | consumed samples:        24128 | elapsed time per iteration (ms): 13530.1 | learning rate: 6.692E-06 | global batch size:    16 | lm loss: 7.303823E+00 | loss scale: 32768.0 | grad norm: 138876.766 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1509/  159576 | consumed samples:        24144 | elapsed time per iteration (ms): 13620.2 | learning rate: 6.697E-06 | global batch size:    16 | lm loss: 7.181733E+00 | loss scale: 32768.0 | grad norm: 245330.128 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1510/  159576 | consumed samples:        24160 | elapsed time per iteration (ms): 13857.7 | learning rate: 6.701E-06 | global batch size:    16 | lm loss: 7.249762E+00 | loss scale: 32768.0 | grad norm: 178346.781 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1511/  159576 | consumed samples:        24176 | elapsed time per iteration (ms): 13642.0 | learning rate: 6.706E-06 | global batch size:    16 | lm loss: 7.141682E+00 | loss scale: 32768.0 | grad norm: 225502.316 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1512/  159576 | consumed samples:        24192 | elapsed time per iteration (ms): 13680.2 | learning rate: 6.710E-06 | global batch size:    16 | lm loss: 7.262461E+00 | loss scale: 32768.0 | grad norm: 152013.376 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1513/  159576 | consumed samples:        24208 | elapsed time per iteration (ms): 6867.5 | learning rate: 6.710E-06 | global batch size:    16 | lm loss: 7.117817E+00 | loss scale: 32768.0 | grad norm: 152013.376 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1514/  159576 | consumed samples:        24224 | elapsed time per iteration (ms): 13192.9 | learning rate: 6.714E-06 | global batch size:    16 | lm loss: 7.508438E+00 | loss scale: 32768.0 | grad norm: 277772.591 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1515/  159576 | consumed samples:        24240 | elapsed time per iteration (ms): 13697.2 | learning rate: 6.719E-06 | global batch size:    16 | lm loss: 7.055306E+00 | loss scale: 32768.0 | grad norm: 184291.975 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1516/  159576 | consumed samples:        24256 | elapsed time per iteration (ms): 13601.8 | learning rate: 6.723E-06 | global batch size:    16 | lm loss: 7.364224E+00 | loss scale: 32768.0 | grad norm: 153076.917 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1517/  159576 | consumed samples:        24272 | elapsed time per iteration (ms): 13603.6 | learning rate: 6.728E-06 | global batch size:    16 | lm loss: 6.912699E+00 | loss scale: 32768.0 | grad norm: 218098.104 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1518/  159576 | consumed samples:        24288 | elapsed time per iteration (ms): 13640.7 | learning rate: 6.732E-06 | global batch size:    16 | lm loss: 7.323909E+00 | loss scale: 32768.0 | grad norm: 216972.778 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1519/  159576 | consumed samples:        24304 | elapsed time per iteration (ms): 14045.8 | learning rate: 6.737E-06 | global batch size:    16 | lm loss: 7.068207E+00 | loss scale: 32768.0 | grad norm: 118810.539 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1520/  159576 | consumed samples:        24320 | elapsed time per iteration (ms): 13595.0 | learning rate: 6.741E-06 | global batch size:    16 | lm loss: 7.160398E+00 | loss scale: 32768.0 | grad norm: 174748.456 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1521/  159576 | consumed samples:        24336 | elapsed time per iteration (ms): 13611.5 | learning rate: 6.746E-06 | global batch size:    16 | lm loss: 7.170628E+00 | loss scale: 32768.0 | grad norm: 146800.781 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1522/  159576 | consumed samples:        24352 | elapsed time per iteration (ms): 13576.3 | learning rate: 6.750E-06 | global batch size:    16 | lm loss: 7.141685E+00 | loss scale: 32768.0 | grad norm: 301970.136 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1523/  159576 | consumed samples:        24368 | elapsed time per iteration (ms): 13818.0 | learning rate: 6.754E-06 | global batch size:    16 | lm loss: 7.351134E+00 | loss scale: 32768.0 | grad norm: 203560.816 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1524/  159576 | consumed samples:        24384 | elapsed time per iteration (ms): 13700.8 | learning rate: 6.759E-06 | global batch size:    16 | lm loss: 7.291396E+00 | loss scale: 32768.0 | grad norm: 186296.459 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1525/  159576 | consumed samples:        24400 | elapsed time per iteration (ms): 13611.8 | learning rate: 6.763E-06 | global batch size:    16 | lm loss: 7.052688E+00 | loss scale: 32768.0 | grad norm: 186235.227 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1526/  159576 | consumed samples:        24416 | elapsed time per iteration (ms): 13626.5 | learning rate: 6.768E-06 | global batch size:    16 | lm loss: 7.083735E+00 | loss scale: 32768.0 | grad norm: 254298.754 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1527/  159576 | consumed samples:        24432 | elapsed time per iteration (ms): 13677.9 | learning rate: 6.772E-06 | global batch size:    16 | lm loss: 7.212967E+00 | loss scale: 32768.0 | grad norm: 290009.050 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1528/  159576 | consumed samples:        24448 | elapsed time per iteration (ms): 13998.5 | learning rate: 6.777E-06 | global batch size:    16 | lm loss: 7.249606E+00 | loss scale: 32768.0 | grad norm: 193082.466 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1529/  159576 | consumed samples:        24464 | elapsed time per iteration (ms): 13543.2 | learning rate: 6.781E-06 | global batch size:    16 | lm loss: 7.187498E+00 | loss scale: 32768.0 | grad norm: 161368.154 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1530/  159576 | consumed samples:        24480 | elapsed time per iteration (ms): 13565.1 | learning rate: 6.786E-06 | global batch size:    16 | lm loss: 7.266234E+00 | loss scale: 32768.0 | grad norm: 198639.321 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1531/  159576 | consumed samples:        24496 | elapsed time per iteration (ms): 13571.4 | learning rate: 6.790E-06 | global batch size:    16 | lm loss: 7.528541E+00 | loss scale: 32768.0 | grad norm: 545404.395 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1532/  159576 | consumed samples:        24512 | elapsed time per iteration (ms): 13970.0 | learning rate: 6.794E-06 | global batch size:    16 | lm loss: 7.212701E+00 | loss scale: 32768.0 | grad norm: 227881.927 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1533/  159576 | consumed samples:        24528 | elapsed time per iteration (ms): 13566.3 | learning rate: 6.799E-06 | global batch size:    16 | lm loss: 7.440462E+00 | loss scale: 32768.0 | grad norm: 170454.067 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1534/  159576 | consumed samples:        24544 | elapsed time per iteration (ms): 13611.2 | learning rate: 6.803E-06 | global batch size:    16 | lm loss: 7.264073E+00 | loss scale: 32768.0 | grad norm: 306199.566 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1535/  159576 | consumed samples:        24560 | elapsed time per iteration (ms): 13661.5 | learning rate: 6.808E-06 | global batch size:    16 | lm loss: 7.109380E+00 | loss scale: 32768.0 | grad norm: 130108.699 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1536/  159576 | consumed samples:        24576 | elapsed time per iteration (ms): 13539.1 | learning rate: 6.812E-06 | global batch size:    16 | lm loss: 7.475006E+00 | loss scale: 32768.0 | grad norm: 447958.462 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1537/  159576 | consumed samples:        24592 | elapsed time per iteration (ms): 13698.1 | learning rate: 6.817E-06 | global batch size:    16 | lm loss: 7.372583E+00 | loss scale: 32768.0 | grad norm: 233240.316 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1538/  159576 | consumed samples:        24608 | elapsed time per iteration (ms): 13601.5 | learning rate: 6.821E-06 | global batch size:    16 | lm loss: 7.208574E+00 | loss scale: 32768.0 | grad norm: 208866.404 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1539/  159576 | consumed samples:        24624 | elapsed time per iteration (ms): 13645.6 | learning rate: 6.825E-06 | global batch size:    16 | lm loss: 7.209548E+00 | loss scale: 32768.0 | grad norm: 290418.296 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1540/  159576 | consumed samples:        24640 | elapsed time per iteration (ms): 13628.1 | learning rate: 6.830E-06 | global batch size:    16 | lm loss: 7.168006E+00 | loss scale: 32768.0 | grad norm: 271187.490 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1541/  159576 | consumed samples:        24656 | elapsed time per iteration (ms): 14103.2 | learning rate: 6.834E-06 | global batch size:    16 | lm loss: 7.235812E+00 | loss scale: 32768.0 | grad norm: 368637.293 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1542/  159576 | consumed samples:        24672 | elapsed time per iteration (ms): 13752.7 | learning rate: 6.839E-06 | global batch size:    16 | lm loss: 7.205466E+00 | loss scale: 32768.0 | grad norm: 275606.149 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1543/  159576 | consumed samples:        24688 | elapsed time per iteration (ms): 13526.0 | learning rate: 6.843E-06 | global batch size:    16 | lm loss: 7.152663E+00 | loss scale: 32768.0 | grad norm: 186385.977 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1544/  159576 | consumed samples:        24704 | elapsed time per iteration (ms): 13591.1 | learning rate: 6.848E-06 | global batch size:    16 | lm loss: 7.402153E+00 | loss scale: 32768.0 | grad norm: 202784.884 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1545/  159576 | consumed samples:        24720 | elapsed time per iteration (ms): 13853.8 | learning rate: 6.852E-06 | global batch size:    16 | lm loss: 7.254861E+00 | loss scale: 32768.0 | grad norm: 302847.689 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1546/  159576 | consumed samples:        24736 | elapsed time per iteration (ms): 13718.3 | learning rate: 6.857E-06 | global batch size:    16 | lm loss: 7.259928E+00 | loss scale: 32768.0 | grad norm: 190927.131 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1547/  159576 | consumed samples:        24752 | elapsed time per iteration (ms): 13565.0 | learning rate: 6.861E-06 | global batch size:    16 | lm loss: 7.226044E+00 | loss scale: 32768.0 | grad norm: 147732.617 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1548/  159576 | consumed samples:        24768 | elapsed time per iteration (ms): 13562.3 | learning rate: 6.865E-06 | global batch size:    16 | lm loss: 7.106945E+00 | loss scale: 32768.0 | grad norm: 275364.195 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1549/  159576 | consumed samples:        24784 | elapsed time per iteration (ms): 13573.3 | learning rate: 6.870E-06 | global batch size:    16 | lm loss: 7.157021E+00 | loss scale: 32768.0 | grad norm: 180244.172 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1550/  159576 | consumed samples:        24800 | elapsed time per iteration (ms): 13916.8 | learning rate: 6.874E-06 | global batch size:    16 | lm loss: 7.001479E+00 | loss scale: 32768.0 | grad norm: 268566.065 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1551/  159576 | consumed samples:        24816 | elapsed time per iteration (ms): 13651.8 | learning rate: 6.879E-06 | global batch size:    16 | lm loss: 7.167608E+00 | loss scale: 32768.0 | grad norm: 198735.053 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1552/  159576 | consumed samples:        24832 | elapsed time per iteration (ms): 13608.0 | learning rate: 6.883E-06 | global batch size:    16 | lm loss: 7.093953E+00 | loss scale: 32768.0 | grad norm: 170933.719 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1553/  159576 | consumed samples:        24848 | elapsed time per iteration (ms): 13517.6 | learning rate: 6.888E-06 | global batch size:    16 | lm loss: 7.234317E+00 | loss scale: 32768.0 | grad norm: 237231.760 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1554/  159576 | consumed samples:        24864 | elapsed time per iteration (ms): 14011.1 | learning rate: 6.892E-06 | global batch size:    16 | lm loss: 7.130560E+00 | loss scale: 32768.0 | grad norm: 237902.373 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1555/  159576 | consumed samples:        24880 | elapsed time per iteration (ms): 13510.9 | learning rate: 6.896E-06 | global batch size:    16 | lm loss: 7.275712E+00 | loss scale: 32768.0 | grad norm: 149656.891 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1556/  159576 | consumed samples:        24896 | elapsed time per iteration (ms): 13617.0 | learning rate: 6.901E-06 | global batch size:    16 | lm loss: 7.239087E+00 | loss scale: 32768.0 | grad norm: 186987.381 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1557/  159576 | consumed samples:        24912 | elapsed time per iteration (ms): 13622.7 | learning rate: 6.905E-06 | global batch size:    16 | lm loss: 6.972548E+00 | loss scale: 32768.0 | grad norm: 167404.940 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1558/  159576 | consumed samples:        24928 | elapsed time per iteration (ms): 13629.7 | learning rate: 6.910E-06 | global batch size:    16 | lm loss: 7.274665E+00 | loss scale: 32768.0 | grad norm: 170409.995 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1559/  159576 | consumed samples:        24944 | elapsed time per iteration (ms): 13856.8 | learning rate: 6.914E-06 | global batch size:    16 | lm loss: 7.320499E+00 | loss scale: 32768.0 | grad norm: 139509.403 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1560/  159576 | consumed samples:        24960 | elapsed time per iteration (ms): 13572.0 | learning rate: 6.919E-06 | global batch size:    16 | lm loss: 7.481147E+00 | loss scale: 32768.0 | grad norm: 204961.182 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1561/  159576 | consumed samples:        24976 | elapsed time per iteration (ms): 13609.9 | learning rate: 6.923E-06 | global batch size:    16 | lm loss: 7.318799E+00 | loss scale: 32768.0 | grad norm: 233741.215 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1562/  159576 | consumed samples:        24992 | elapsed time per iteration (ms): 13593.5 | learning rate: 6.928E-06 | global batch size:    16 | lm loss: 6.970228E+00 | loss scale: 32768.0 | grad norm: 159417.196 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1563/  159576 | consumed samples:        25008 | elapsed time per iteration (ms): 13894.7 | learning rate: 6.932E-06 | global batch size:    16 | lm loss: 7.266310E+00 | loss scale: 32768.0 | grad norm: 154081.846 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1564/  159576 | consumed samples:        25024 | elapsed time per iteration (ms): 13687.0 | learning rate: 6.936E-06 | global batch size:    16 | lm loss: 7.274476E+00 | loss scale: 32768.0 | grad norm: 258666.127 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1565/  159576 | consumed samples:        25040 | elapsed time per iteration (ms): 13663.3 | learning rate: 6.941E-06 | global batch size:    16 | lm loss: 7.125623E+00 | loss scale: 32768.0 | grad norm: 167968.329 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1566/  159576 | consumed samples:        25056 | elapsed time per iteration (ms): 13604.1 | learning rate: 6.945E-06 | global batch size:    16 | lm loss: 7.210727E+00 | loss scale: 32768.0 | grad norm: 198543.646 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1567/  159576 | consumed samples:        25072 | elapsed time per iteration (ms): 14015.2 | learning rate: 6.950E-06 | global batch size:    16 | lm loss: 7.245472E+00 | loss scale: 32768.0 | grad norm: 149711.382 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1568/  159576 | consumed samples:        25088 | elapsed time per iteration (ms): 13524.3 | learning rate: 6.954E-06 | global batch size:    16 | lm loss: 6.959779E+00 | loss scale: 32768.0 | grad norm: 217321.763 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1569/  159576 | consumed samples:        25104 | elapsed time per iteration (ms): 13601.8 | learning rate: 6.959E-06 | global batch size:    16 | lm loss: 7.177199E+00 | loss scale: 32768.0 | grad norm: 254297.194 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1570/  159576 | consumed samples:        25120 | elapsed time per iteration (ms): 13589.9 | learning rate: 6.963E-06 | global batch size:    16 | lm loss: 7.113214E+00 | loss scale: 32768.0 | grad norm: 172729.515 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1571/  159576 | consumed samples:        25136 | elapsed time per iteration (ms): 13658.1 | learning rate: 6.967E-06 | global batch size:    16 | lm loss: 7.054616E+00 | loss scale: 32768.0 | grad norm: 176859.362 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1572/  159576 | consumed samples:        25152 | elapsed time per iteration (ms): 13798.6 | learning rate: 6.972E-06 | global batch size:    16 | lm loss: 7.111713E+00 | loss scale: 32768.0 | grad norm: 165282.457 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1573/  159576 | consumed samples:        25168 | elapsed time per iteration (ms): 13684.6 | learning rate: 6.976E-06 | global batch size:    16 | lm loss: 7.324330E+00 | loss scale: 32768.0 | grad norm: 205395.896 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1574/  159576 | consumed samples:        25184 | elapsed time per iteration (ms): 13612.3 | learning rate: 6.981E-06 | global batch size:    16 | lm loss: 7.139562E+00 | loss scale: 32768.0 | grad norm: 201180.686 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1575/  159576 | consumed samples:        25200 | elapsed time per iteration (ms): 13567.2 | learning rate: 6.985E-06 | global batch size:    16 | lm loss: 7.063004E+00 | loss scale: 32768.0 | grad norm: 126181.509 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1576/  159576 | consumed samples:        25216 | elapsed time per iteration (ms): 13982.4 | learning rate: 6.990E-06 | global batch size:    16 | lm loss: 7.030066E+00 | loss scale: 32768.0 | grad norm: 261758.694 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1577/  159576 | consumed samples:        25232 | elapsed time per iteration (ms): 13552.2 | learning rate: 6.994E-06 | global batch size:    16 | lm loss: 7.129750E+00 | loss scale: 32768.0 | grad norm: 133747.300 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1578/  159576 | consumed samples:        25248 | elapsed time per iteration (ms): 13576.0 | learning rate: 6.999E-06 | global batch size:    16 | lm loss: 7.478085E+00 | loss scale: 32768.0 | grad norm: 193421.594 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1579/  159576 | consumed samples:        25264 | elapsed time per iteration (ms): 13627.7 | learning rate: 7.003E-06 | global batch size:    16 | lm loss: 7.062607E+00 | loss scale: 32768.0 | grad norm: 162309.186 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1580/  159576 | consumed samples:        25280 | elapsed time per iteration (ms): 13870.0 | learning rate: 7.007E-06 | global batch size:    16 | lm loss: 6.734056E+00 | loss scale: 32768.0 | grad norm: 233732.101 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1581/  159576 | consumed samples:        25296 | elapsed time per iteration (ms): 13680.5 | learning rate: 7.012E-06 | global batch size:    16 | lm loss: 7.360079E+00 | loss scale: 32768.0 | grad norm: 189405.056 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1582/  159576 | consumed samples:        25312 | elapsed time per iteration (ms): 13679.9 | learning rate: 7.016E-06 | global batch size:    16 | lm loss: 7.291443E+00 | loss scale: 32768.0 | grad norm: 159639.849 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1583/  159576 | consumed samples:        25328 | elapsed time per iteration (ms): 13579.9 | learning rate: 7.021E-06 | global batch size:    16 | lm loss: 7.361541E+00 | loss scale: 32768.0 | grad norm: 178947.980 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1584/  159576 | consumed samples:        25344 | elapsed time per iteration (ms): 13614.6 | learning rate: 7.025E-06 | global batch size:    16 | lm loss: 7.145397E+00 | loss scale: 32768.0 | grad norm: 198293.827 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1585/  159576 | consumed samples:        25360 | elapsed time per iteration (ms): 13943.5 | learning rate: 7.030E-06 | global batch size:    16 | lm loss: 7.009763E+00 | loss scale: 32768.0 | grad norm: 172995.962 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1586/  159576 | consumed samples:        25376 | elapsed time per iteration (ms): 13665.6 | learning rate: 7.034E-06 | global batch size:    16 | lm loss: 7.306109E+00 | loss scale: 32768.0 | grad norm: 193555.142 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1587/  159576 | consumed samples:        25392 | elapsed time per iteration (ms): 13713.0 | learning rate: 7.038E-06 | global batch size:    16 | lm loss: 7.341703E+00 | loss scale: 32768.0 | grad norm: 240981.196 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1588/  159576 | consumed samples:        25408 | elapsed time per iteration (ms): 13685.0 | learning rate: 7.043E-06 | global batch size:    16 | lm loss: 7.076401E+00 | loss scale: 32768.0 | grad norm: 144170.844 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1589/  159576 | consumed samples:        25424 | elapsed time per iteration (ms): 13990.2 | learning rate: 7.047E-06 | global batch size:    16 | lm loss: 7.016201E+00 | loss scale: 32768.0 | grad norm: 215101.083 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1590/  159576 | consumed samples:        25440 | elapsed time per iteration (ms): 13615.2 | learning rate: 7.052E-06 | global batch size:    16 | lm loss: 7.248097E+00 | loss scale: 32768.0 | grad norm: 183674.866 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1591/  159576 | consumed samples:        25456 | elapsed time per iteration (ms): 13603.7 | learning rate: 7.056E-06 | global batch size:    16 | lm loss: 7.274388E+00 | loss scale: 32768.0 | grad norm: 194912.772 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1592/  159576 | consumed samples:        25472 | elapsed time per iteration (ms): 13589.1 | learning rate: 7.061E-06 | global batch size:    16 | lm loss: 7.189001E+00 | loss scale: 32768.0 | grad norm: 178991.312 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1593/  159576 | consumed samples:        25488 | elapsed time per iteration (ms): 13610.8 | learning rate: 7.065E-06 | global batch size:    16 | lm loss: 7.232603E+00 | loss scale: 32768.0 | grad norm: 152962.889 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1594/  159576 | consumed samples:        25504 | elapsed time per iteration (ms): 13768.0 | learning rate: 7.070E-06 | global batch size:    16 | lm loss: 7.102697E+00 | loss scale: 32768.0 | grad norm: 144835.907 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1595/  159576 | consumed samples:        25520 | elapsed time per iteration (ms): 13616.0 | learning rate: 7.074E-06 | global batch size:    16 | lm loss: 7.124231E+00 | loss scale: 32768.0 | grad norm: 492597.129 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1596/  159576 | consumed samples:        25536 | elapsed time per iteration (ms): 13671.0 | learning rate: 7.078E-06 | global batch size:    16 | lm loss: 7.347673E+00 | loss scale: 32768.0 | grad norm: 283986.803 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1597/  159576 | consumed samples:        25552 | elapsed time per iteration (ms): 13618.5 | learning rate: 7.083E-06 | global batch size:    16 | lm loss: 7.247316E+00 | loss scale: 32768.0 | grad norm: 185319.173 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1598/  159576 | consumed samples:        25568 | elapsed time per iteration (ms): 14074.4 | learning rate: 7.087E-06 | global batch size:    16 | lm loss: 7.152137E+00 | loss scale: 32768.0 | grad norm: 179820.746 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1599/  159576 | consumed samples:        25584 | elapsed time per iteration (ms): 13609.5 | learning rate: 7.092E-06 | global batch size:    16 | lm loss: 7.087896E+00 | loss scale: 32768.0 | grad norm: 178653.073 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1600/  159576 | consumed samples:        25600 | elapsed time per iteration (ms): 13606.5 | learning rate: 7.096E-06 | global batch size:    16 | lm loss: 7.094335E+00 | loss scale: 32768.0 | grad norm: 197442.311 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1601/  159576 | consumed samples:        25616 | elapsed time per iteration (ms): 13605.3 | learning rate: 7.101E-06 | global batch size:    16 | lm loss: 7.230387E+00 | loss scale: 32768.0 | grad norm: 277453.177 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1602/  159576 | consumed samples:        25632 | elapsed time per iteration (ms): 14026.8 | learning rate: 7.105E-06 | global batch size:    16 | lm loss: 7.399794E+00 | loss scale: 32768.0 | grad norm: 202190.175 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1603/  159576 | consumed samples:        25648 | elapsed time per iteration (ms): 13782.5 | learning rate: 7.109E-06 | global batch size:    16 | lm loss: 7.261839E+00 | loss scale: 32768.0 | grad norm: 162395.296 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1604/  159576 | consumed samples:        25664 | elapsed time per iteration (ms): 13652.4 | learning rate: 7.114E-06 | global batch size:    16 | lm loss: 7.202652E+00 | loss scale: 32768.0 | grad norm: 199798.347 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1605/  159576 | consumed samples:        25680 | elapsed time per iteration (ms): 13537.9 | learning rate: 7.118E-06 | global batch size:    16 | lm loss: 7.002069E+00 | loss scale: 32768.0 | grad norm: 200932.321 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1606/  159576 | consumed samples:        25696 | elapsed time per iteration (ms): 13623.9 | learning rate: 7.123E-06 | global batch size:    16 | lm loss: 6.994870E+00 | loss scale: 32768.0 | grad norm: 182105.456 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1607/  159576 | consumed samples:        25712 | elapsed time per iteration (ms): 13778.9 | learning rate: 7.127E-06 | global batch size:    16 | lm loss: 7.236290E+00 | loss scale: 32768.0 | grad norm: 210525.575 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1608/  159576 | consumed samples:        25728 | elapsed time per iteration (ms): 13614.0 | learning rate: 7.132E-06 | global batch size:    16 | lm loss: 7.271640E+00 | loss scale: 32768.0 | grad norm: 155104.364 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1609/  159576 | consumed samples:        25744 | elapsed time per iteration (ms): 13637.4 | learning rate: 7.136E-06 | global batch size:    16 | lm loss: 7.142178E+00 | loss scale: 32768.0 | grad norm: 179013.826 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1610/  159576 | consumed samples:        25760 | elapsed time per iteration (ms): 13663.2 | learning rate: 7.141E-06 | global batch size:    16 | lm loss: 7.233703E+00 | loss scale: 32768.0 | grad norm: 205415.974 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1611/  159576 | consumed samples:        25776 | elapsed time per iteration (ms): 14078.6 | learning rate: 7.145E-06 | global batch size:    16 | lm loss: 7.137359E+00 | loss scale: 32768.0 | grad norm: 211115.165 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1612/  159576 | consumed samples:        25792 | elapsed time per iteration (ms): 13476.7 | learning rate: 7.149E-06 | global batch size:    16 | lm loss: 7.265315E+00 | loss scale: 32768.0 | grad norm: 221323.191 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1613/  159576 | consumed samples:        25808 | elapsed time per iteration (ms): 13601.4 | learning rate: 7.154E-06 | global batch size:    16 | lm loss: 7.092045E+00 | loss scale: 32768.0 | grad norm: 157009.908 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1614/  159576 | consumed samples:        25824 | elapsed time per iteration (ms): 13616.6 | learning rate: 7.158E-06 | global batch size:    16 | lm loss: 7.018819E+00 | loss scale: 32768.0 | grad norm: 198533.340 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1615/  159576 | consumed samples:        25840 | elapsed time per iteration (ms): 13623.7 | learning rate: 7.163E-06 | global batch size:    16 | lm loss: 7.280205E+00 | loss scale: 32768.0 | grad norm: 288417.013 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1616/  159576 | consumed samples:        25856 | elapsed time per iteration (ms): 13877.9 | learning rate: 7.167E-06 | global batch size:    16 | lm loss: 7.224732E+00 | loss scale: 32768.0 | grad norm: 186062.210 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1617/  159576 | consumed samples:        25872 | elapsed time per iteration (ms): 13663.6 | learning rate: 7.172E-06 | global batch size:    16 | lm loss: 7.238441E+00 | loss scale: 32768.0 | grad norm: 168294.596 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1618/  159576 | consumed samples:        25888 | elapsed time per iteration (ms): 13675.4 | learning rate: 7.176E-06 | global batch size:    16 | lm loss: 7.159503E+00 | loss scale: 32768.0 | grad norm: 181012.249 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1619/  159576 | consumed samples:        25904 | elapsed time per iteration (ms): 13559.3 | learning rate: 7.180E-06 | global batch size:    16 | lm loss: 7.125117E+00 | loss scale: 32768.0 | grad norm: 156261.868 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1620/  159576 | consumed samples:        25920 | elapsed time per iteration (ms): 14141.4 | learning rate: 7.185E-06 | global batch size:    16 | lm loss: 7.312489E+00 | loss scale: 32768.0 | grad norm: 501804.049 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1621/  159576 | consumed samples:        25936 | elapsed time per iteration (ms): 13619.8 | learning rate: 7.189E-06 | global batch size:    16 | lm loss: 7.144738E+00 | loss scale: 32768.0 | grad norm: 187512.417 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1622/  159576 | consumed samples:        25952 | elapsed time per iteration (ms): 13623.1 | learning rate: 7.194E-06 | global batch size:    16 | lm loss: 7.036147E+00 | loss scale: 32768.0 | grad norm: 185668.156 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1623/  159576 | consumed samples:        25968 | elapsed time per iteration (ms): 13626.1 | learning rate: 7.198E-06 | global batch size:    16 | lm loss: 6.981637E+00 | loss scale: 32768.0 | grad norm: 194478.314 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1624/  159576 | consumed samples:        25984 | elapsed time per iteration (ms): 13916.5 | learning rate: 7.203E-06 | global batch size:    16 | lm loss: 7.098595E+00 | loss scale: 32768.0 | grad norm: 176876.504 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1625/  159576 | consumed samples:        26000 | elapsed time per iteration (ms): 13897.1 | learning rate: 7.207E-06 | global batch size:    16 | lm loss: 7.024785E+00 | loss scale: 32768.0 | grad norm: 133422.500 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1626/  159576 | consumed samples:        26016 | elapsed time per iteration (ms): 13553.3 | learning rate: 7.212E-06 | global batch size:    16 | lm loss: 7.101878E+00 | loss scale: 32768.0 | grad norm: 187471.535 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1627/  159576 | consumed samples:        26032 | elapsed time per iteration (ms): 13608.6 | learning rate: 7.216E-06 | global batch size:    16 | lm loss: 7.083658E+00 | loss scale: 32768.0 | grad norm: 163022.597 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1628/  159576 | consumed samples:        26048 | elapsed time per iteration (ms): 13598.7 | learning rate: 7.220E-06 | global batch size:    16 | lm loss: 7.128680E+00 | loss scale: 32768.0 | grad norm: 227341.519 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1629/  159576 | consumed samples:        26064 | elapsed time per iteration (ms): 13737.0 | learning rate: 7.225E-06 | global batch size:    16 | lm loss: 7.226182E+00 | loss scale: 32768.0 | grad norm: 173557.190 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1630/  159576 | consumed samples:        26080 | elapsed time per iteration (ms): 13598.4 | learning rate: 7.229E-06 | global batch size:    16 | lm loss: 7.204190E+00 | loss scale: 32768.0 | grad norm: 194336.283 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1631/  159576 | consumed samples:        26096 | elapsed time per iteration (ms): 13618.5 | learning rate: 7.234E-06 | global batch size:    16 | lm loss: 7.295867E+00 | loss scale: 32768.0 | grad norm: 218111.651 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1632/  159576 | consumed samples:        26112 | elapsed time per iteration (ms): 13608.1 | learning rate: 7.238E-06 | global batch size:    16 | lm loss: 7.313629E+00 | loss scale: 32768.0 | grad norm: 150755.205 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1633/  159576 | consumed samples:        26128 | elapsed time per iteration (ms): 13926.3 | learning rate: 7.243E-06 | global batch size:    16 | lm loss: 7.105534E+00 | loss scale: 32768.0 | grad norm: 416417.348 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1634/  159576 | consumed samples:        26144 | elapsed time per iteration (ms): 13573.4 | learning rate: 7.247E-06 | global batch size:    16 | lm loss: 7.154237E+00 | loss scale: 32768.0 | grad norm: 222886.895 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1635/  159576 | consumed samples:        26160 | elapsed time per iteration (ms): 13613.9 | learning rate: 7.251E-06 | global batch size:    16 | lm loss: 7.367383E+00 | loss scale: 32768.0 | grad norm: 198928.120 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1636/  159576 | consumed samples:        26176 | elapsed time per iteration (ms): 13620.0 | learning rate: 7.256E-06 | global batch size:    16 | lm loss: 7.224826E+00 | loss scale: 32768.0 | grad norm: 190490.724 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1637/  159576 | consumed samples:        26192 | elapsed time per iteration (ms): 13847.4 | learning rate: 7.260E-06 | global batch size:    16 | lm loss: 7.133263E+00 | loss scale: 32768.0 | grad norm: 335044.490 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1638/  159576 | consumed samples:        26208 | elapsed time per iteration (ms): 13680.4 | learning rate: 7.265E-06 | global batch size:    16 | lm loss: 6.991650E+00 | loss scale: 32768.0 | grad norm: 351935.284 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1639/  159576 | consumed samples:        26224 | elapsed time per iteration (ms): 13603.3 | learning rate: 7.269E-06 | global batch size:    16 | lm loss: 7.261710E+00 | loss scale: 32768.0 | grad norm: 162679.611 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1640/  159576 | consumed samples:        26240 | elapsed time per iteration (ms): 13643.0 | learning rate: 7.274E-06 | global batch size:    16 | lm loss: 7.243075E+00 | loss scale: 32768.0 | grad norm: 139259.853 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1641/  159576 | consumed samples:        26256 | elapsed time per iteration (ms): 13685.4 | learning rate: 7.278E-06 | global batch size:    16 | lm loss: 7.347486E+00 | loss scale: 32768.0 | grad norm: 190145.472 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1642/  159576 | consumed samples:        26272 | elapsed time per iteration (ms): 13709.0 | learning rate: 7.283E-06 | global batch size:    16 | lm loss: 7.168586E+00 | loss scale: 32768.0 | grad norm: 250612.086 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1643/  159576 | consumed samples:        26288 | elapsed time per iteration (ms): 13686.3 | learning rate: 7.287E-06 | global batch size:    16 | lm loss: 7.042645E+00 | loss scale: 32768.0 | grad norm: 181688.669 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1644/  159576 | consumed samples:        26304 | elapsed time per iteration (ms): 13617.6 | learning rate: 7.291E-06 | global batch size:    16 | lm loss: 6.992811E+00 | loss scale: 32768.0 | grad norm: 173387.997 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1645/  159576 | consumed samples:        26320 | elapsed time per iteration (ms): 13588.3 | learning rate: 7.296E-06 | global batch size:    16 | lm loss: 6.948548E+00 | loss scale: 32768.0 | grad norm: 204171.623 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1646/  159576 | consumed samples:        26336 | elapsed time per iteration (ms): 13943.8 | learning rate: 7.300E-06 | global batch size:    16 | lm loss: 7.227940E+00 | loss scale: 32768.0 | grad norm: 249546.841 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1647/  159576 | consumed samples:        26352 | elapsed time per iteration (ms): 13526.7 | learning rate: 7.305E-06 | global batch size:    16 | lm loss: 7.150325E+00 | loss scale: 32768.0 | grad norm: 187163.297 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1648/  159576 | consumed samples:        26368 | elapsed time per iteration (ms): 13689.1 | learning rate: 7.309E-06 | global batch size:    16 | lm loss: 7.017026E+00 | loss scale: 32768.0 | grad norm: 155331.100 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1649/  159576 | consumed samples:        26384 | elapsed time per iteration (ms): 13592.0 | learning rate: 7.314E-06 | global batch size:    16 | lm loss: 6.946849E+00 | loss scale: 32768.0 | grad norm: 224463.632 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1650/  159576 | consumed samples:        26400 | elapsed time per iteration (ms): 13576.3 | learning rate: 7.318E-06 | global batch size:    16 | lm loss: 7.179192E+00 | loss scale: 32768.0 | grad norm: 276611.361 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1651/  159576 | consumed samples:        26416 | elapsed time per iteration (ms): 13958.1 | learning rate: 7.322E-06 | global batch size:    16 | lm loss: 7.176366E+00 | loss scale: 32768.0 | grad norm: 180366.507 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1652/  159576 | consumed samples:        26432 | elapsed time per iteration (ms): 13632.4 | learning rate: 7.327E-06 | global batch size:    16 | lm loss: 7.206745E+00 | loss scale: 32768.0 | grad norm: 135845.317 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1653/  159576 | consumed samples:        26448 | elapsed time per iteration (ms): 13613.1 | learning rate: 7.331E-06 | global batch size:    16 | lm loss: 7.259154E+00 | loss scale: 32768.0 | grad norm: 403068.502 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1654/  159576 | consumed samples:        26464 | elapsed time per iteration (ms): 13593.5 | learning rate: 7.336E-06 | global batch size:    16 | lm loss: 7.201679E+00 | loss scale: 32768.0 | grad norm: 362463.795 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1655/  159576 | consumed samples:        26480 | elapsed time per iteration (ms): 14016.8 | learning rate: 7.340E-06 | global batch size:    16 | lm loss: 7.291797E+00 | loss scale: 32768.0 | grad norm: 167369.816 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1656/  159576 | consumed samples:        26496 | elapsed time per iteration (ms): 13699.1 | learning rate: 7.345E-06 | global batch size:    16 | lm loss: 7.091952E+00 | loss scale: 32768.0 | grad norm: 165135.009 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1657/  159576 | consumed samples:        26512 | elapsed time per iteration (ms): 13569.2 | learning rate: 7.349E-06 | global batch size:    16 | lm loss: 7.068718E+00 | loss scale: 32768.0 | grad norm: 202181.410 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1658/  159576 | consumed samples:        26528 | elapsed time per iteration (ms): 13577.2 | learning rate: 7.354E-06 | global batch size:    16 | lm loss: 7.233033E+00 | loss scale: 32768.0 | grad norm: 333361.854 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1659/  159576 | consumed samples:        26544 | elapsed time per iteration (ms): 13970.5 | learning rate: 7.358E-06 | global batch size:    16 | lm loss: 7.330973E+00 | loss scale: 32768.0 | grad norm: 164401.480 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1660/  159576 | consumed samples:        26560 | elapsed time per iteration (ms): 13585.6 | learning rate: 7.362E-06 | global batch size:    16 | lm loss: 7.127686E+00 | loss scale: 32768.0 | grad norm: 165830.496 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1661/  159576 | consumed samples:        26576 | elapsed time per iteration (ms): 13601.7 | learning rate: 7.367E-06 | global batch size:    16 | lm loss: 7.202850E+00 | loss scale: 32768.0 | grad norm: 214035.250 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1662/  159576 | consumed samples:        26592 | elapsed time per iteration (ms): 13596.7 | learning rate: 7.371E-06 | global batch size:    16 | lm loss: 7.194968E+00 | loss scale: 32768.0 | grad norm: 269427.808 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1663/  159576 | consumed samples:        26608 | elapsed time per iteration (ms): 13626.2 | learning rate: 7.376E-06 | global batch size:    16 | lm loss: 7.079875E+00 | loss scale: 32768.0 | grad norm: 243204.527 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1664/  159576 | consumed samples:        26624 | elapsed time per iteration (ms): 13820.6 | learning rate: 7.380E-06 | global batch size:    16 | lm loss: 7.253979E+00 | loss scale: 32768.0 | grad norm: 184892.216 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1665/  159576 | consumed samples:        26640 | elapsed time per iteration (ms): 13606.7 | learning rate: 7.385E-06 | global batch size:    16 | lm loss: 7.021820E+00 | loss scale: 32768.0 | grad norm: 220398.877 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1666/  159576 | consumed samples:        26656 | elapsed time per iteration (ms): 13594.3 | learning rate: 7.389E-06 | global batch size:    16 | lm loss: 7.115512E+00 | loss scale: 32768.0 | grad norm: 307682.966 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1667/  159576 | consumed samples:        26672 | elapsed time per iteration (ms): 13584.1 | learning rate: 7.393E-06 | global batch size:    16 | lm loss: 7.301219E+00 | loss scale: 32768.0 | grad norm: 326739.461 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1668/  159576 | consumed samples:        26688 | elapsed time per iteration (ms): 13934.9 | learning rate: 7.398E-06 | global batch size:    16 | lm loss: 7.091152E+00 | loss scale: 32768.0 | grad norm: 179218.130 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1669/  159576 | consumed samples:        26704 | elapsed time per iteration (ms): 13576.9 | learning rate: 7.402E-06 | global batch size:    16 | lm loss: 7.060991E+00 | loss scale: 32768.0 | grad norm: 212478.902 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1670/  159576 | consumed samples:        26720 | elapsed time per iteration (ms): 13622.1 | learning rate: 7.407E-06 | global batch size:    16 | lm loss: 7.225494E+00 | loss scale: 32768.0 | grad norm: 312859.396 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1671/  159576 | consumed samples:        26736 | elapsed time per iteration (ms): 13558.9 | learning rate: 7.411E-06 | global batch size:    16 | lm loss: 6.931543E+00 | loss scale: 32768.0 | grad norm: 214910.265 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1672/  159576 | consumed samples:        26752 | elapsed time per iteration (ms): 13593.0 | learning rate: 7.416E-06 | global batch size:    16 | lm loss: 7.111391E+00 | loss scale: 32768.0 | grad norm: 167374.362 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1673/  159576 | consumed samples:        26768 | elapsed time per iteration (ms): 14083.5 | learning rate: 7.420E-06 | global batch size:    16 | lm loss: 7.119873E+00 | loss scale: 32768.0 | grad norm: 207656.393 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1674/  159576 | consumed samples:        26784 | elapsed time per iteration (ms): 13580.7 | learning rate: 7.425E-06 | global batch size:    16 | lm loss: 7.190612E+00 | loss scale: 32768.0 | grad norm: 138716.556 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1675/  159576 | consumed samples:        26800 | elapsed time per iteration (ms): 13560.5 | learning rate: 7.429E-06 | global batch size:    16 | lm loss: 7.118540E+00 | loss scale: 32768.0 | grad norm: 288523.946 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1676/  159576 | consumed samples:        26816 | elapsed time per iteration (ms): 13591.4 | learning rate: 7.433E-06 | global batch size:    16 | lm loss: 7.228687E+00 | loss scale: 32768.0 | grad norm: 184651.956 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1677/  159576 | consumed samples:        26832 | elapsed time per iteration (ms): 14019.3 | learning rate: 7.438E-06 | global batch size:    16 | lm loss: 7.062222E+00 | loss scale: 32768.0 | grad norm: 166988.550 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1678/  159576 | consumed samples:        26848 | elapsed time per iteration (ms): 13663.4 | learning rate: 7.442E-06 | global batch size:    16 | lm loss: 7.206205E+00 | loss scale: 32768.0 | grad norm: 760966.811 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1679/  159576 | consumed samples:        26864 | elapsed time per iteration (ms): 13583.3 | learning rate: 7.447E-06 | global batch size:    16 | lm loss: 7.183750E+00 | loss scale: 32768.0 | grad norm: 619056.103 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1680/  159576 | consumed samples:        26880 | elapsed time per iteration (ms): 13598.8 | learning rate: 7.451E-06 | global batch size:    16 | lm loss: 7.188565E+00 | loss scale: 32768.0 | grad norm: 363445.728 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1681/  159576 | consumed samples:        26896 | elapsed time per iteration (ms): 14083.3 | learning rate: 7.456E-06 | global batch size:    16 | lm loss: 7.135269E+00 | loss scale: 32768.0 | grad norm: 201434.725 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1682/  159576 | consumed samples:        26912 | elapsed time per iteration (ms): 13432.4 | learning rate: 7.460E-06 | global batch size:    16 | lm loss: 7.080773E+00 | loss scale: 32768.0 | grad norm: 223123.023 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1683/  159576 | consumed samples:        26928 | elapsed time per iteration (ms): 13629.9 | learning rate: 7.464E-06 | global batch size:    16 | lm loss: 7.018581E+00 | loss scale: 32768.0 | grad norm: 160716.882 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1684/  159576 | consumed samples:        26944 | elapsed time per iteration (ms): 13543.1 | learning rate: 7.469E-06 | global batch size:    16 | lm loss: 7.045646E+00 | loss scale: 32768.0 | grad norm: 319366.517 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1685/  159576 | consumed samples:        26960 | elapsed time per iteration (ms): 13556.0 | learning rate: 7.473E-06 | global batch size:    16 | lm loss: 7.139486E+00 | loss scale: 32768.0 | grad norm: 154250.022 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1686/  159576 | consumed samples:        26976 | elapsed time per iteration (ms): 13875.3 | learning rate: 7.478E-06 | global batch size:    16 | lm loss: 7.146173E+00 | loss scale: 32768.0 | grad norm: 186495.170 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1687/  159576 | consumed samples:        26992 | elapsed time per iteration (ms): 13583.8 | learning rate: 7.482E-06 | global batch size:    16 | lm loss: 7.207047E+00 | loss scale: 32768.0 | grad norm: 129574.140 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1688/  159576 | consumed samples:        27008 | elapsed time per iteration (ms): 13590.1 | learning rate: 7.487E-06 | global batch size:    16 | lm loss: 7.150177E+00 | loss scale: 32768.0 | grad norm: 310199.485 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1689/  159576 | consumed samples:        27024 | elapsed time per iteration (ms): 13636.7 | learning rate: 7.491E-06 | global batch size:    16 | lm loss: 7.136959E+00 | loss scale: 32768.0 | grad norm: 142456.264 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1690/  159576 | consumed samples:        27040 | elapsed time per iteration (ms): 13898.3 | learning rate: 7.496E-06 | global batch size:    16 | lm loss: 6.991103E+00 | loss scale: 32768.0 | grad norm: 206942.247 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1691/  159576 | consumed samples:        27056 | elapsed time per iteration (ms): 13637.0 | learning rate: 7.500E-06 | global batch size:    16 | lm loss: 7.147140E+00 | loss scale: 32768.0 | grad norm: 297164.074 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1692/  159576 | consumed samples:        27072 | elapsed time per iteration (ms): 13592.2 | learning rate: 7.504E-06 | global batch size:    16 | lm loss: 7.166695E+00 | loss scale: 32768.0 | grad norm: 174829.948 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1693/  159576 | consumed samples:        27088 | elapsed time per iteration (ms): 13634.0 | learning rate: 7.509E-06 | global batch size:    16 | lm loss: 7.124074E+00 | loss scale: 32768.0 | grad norm: 356202.604 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1694/  159576 | consumed samples:        27104 | elapsed time per iteration (ms): 13929.9 | learning rate: 7.513E-06 | global batch size:    16 | lm loss: 7.219958E+00 | loss scale: 32768.0 | grad norm: 288764.199 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1695/  159576 | consumed samples:        27120 | elapsed time per iteration (ms): 13812.8 | learning rate: 7.518E-06 | global batch size:    16 | lm loss: 7.030488E+00 | loss scale: 32768.0 | grad norm: 164638.861 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1696/  159576 | consumed samples:        27136 | elapsed time per iteration (ms): 13601.5 | learning rate: 7.522E-06 | global batch size:    16 | lm loss: 7.288185E+00 | loss scale: 32768.0 | grad norm: 241747.916 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1697/  159576 | consumed samples:        27152 | elapsed time per iteration (ms): 13619.0 | learning rate: 7.527E-06 | global batch size:    16 | lm loss: 7.110942E+00 | loss scale: 32768.0 | grad norm: 183251.862 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1698/  159576 | consumed samples:        27168 | elapsed time per iteration (ms): 13580.4 | learning rate: 7.531E-06 | global batch size:    16 | lm loss: 7.096193E+00 | loss scale: 32768.0 | grad norm: 187930.778 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1699/  159576 | consumed samples:        27184 | elapsed time per iteration (ms): 14055.7 | learning rate: 7.536E-06 | global batch size:    16 | lm loss: 6.976962E+00 | loss scale: 32768.0 | grad norm: 186599.931 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1700/  159576 | consumed samples:        27200 | elapsed time per iteration (ms): 13642.0 | learning rate: 7.540E-06 | global batch size:    16 | lm loss: 6.916706E+00 | loss scale: 32768.0 | grad norm: 212948.424 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1701/  159576 | consumed samples:        27216 | elapsed time per iteration (ms): 13615.0 | learning rate: 7.544E-06 | global batch size:    16 | lm loss: 7.194331E+00 | loss scale: 32768.0 | grad norm: 144812.346 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1702/  159576 | consumed samples:        27232 | elapsed time per iteration (ms): 13551.3 | learning rate: 7.549E-06 | global batch size:    16 | lm loss: 7.139325E+00 | loss scale: 32768.0 | grad norm: 331590.334 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1703/  159576 | consumed samples:        27248 | elapsed time per iteration (ms): 13973.8 | learning rate: 7.553E-06 | global batch size:    16 | lm loss: 7.042914E+00 | loss scale: 32768.0 | grad norm: 195366.856 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1704/  159576 | consumed samples:        27264 | elapsed time per iteration (ms): 13614.8 | learning rate: 7.558E-06 | global batch size:    16 | lm loss: 7.087082E+00 | loss scale: 32768.0 | grad norm: 217381.135 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1705/  159576 | consumed samples:        27280 | elapsed time per iteration (ms): 13611.2 | learning rate: 7.562E-06 | global batch size:    16 | lm loss: 7.013979E+00 | loss scale: 32768.0 | grad norm: 198091.797 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1706/  159576 | consumed samples:        27296 | elapsed time per iteration (ms): 13574.3 | learning rate: 7.567E-06 | global batch size:    16 | lm loss: 7.016004E+00 | loss scale: 32768.0 | grad norm: 222098.009 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1707/  159576 | consumed samples:        27312 | elapsed time per iteration (ms): 13629.3 | learning rate: 7.571E-06 | global batch size:    16 | lm loss: 7.175000E+00 | loss scale: 32768.0 | grad norm: 409215.441 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1708/  159576 | consumed samples:        27328 | elapsed time per iteration (ms): 13904.2 | learning rate: 7.575E-06 | global batch size:    16 | lm loss: 7.071371E+00 | loss scale: 32768.0 | grad norm: 273410.975 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1709/  159576 | consumed samples:        27344 | elapsed time per iteration (ms): 13558.1 | learning rate: 7.580E-06 | global batch size:    16 | lm loss: 7.002718E+00 | loss scale: 32768.0 | grad norm: 197884.964 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1710/  159576 | consumed samples:        27360 | elapsed time per iteration (ms): 13639.3 | learning rate: 7.584E-06 | global batch size:    16 | lm loss: 7.323861E+00 | loss scale: 32768.0 | grad norm: 172073.111 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1711/  159576 | consumed samples:        27376 | elapsed time per iteration (ms): 13631.6 | learning rate: 7.589E-06 | global batch size:    16 | lm loss: 6.922392E+00 | loss scale: 32768.0 | grad norm: 326721.457 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1712/  159576 | consumed samples:        27392 | elapsed time per iteration (ms): 13982.8 | learning rate: 7.593E-06 | global batch size:    16 | lm loss: 7.148055E+00 | loss scale: 32768.0 | grad norm: 280337.172 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1713/  159576 | consumed samples:        27408 | elapsed time per iteration (ms): 13635.8 | learning rate: 7.598E-06 | global batch size:    16 | lm loss: 7.088178E+00 | loss scale: 32768.0 | grad norm: 200762.506 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1714/  159576 | consumed samples:        27424 | elapsed time per iteration (ms): 13581.9 | learning rate: 7.602E-06 | global batch size:    16 | lm loss: 7.096650E+00 | loss scale: 32768.0 | grad norm: 204299.283 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1715/  159576 | consumed samples:        27440 | elapsed time per iteration (ms): 13647.6 | learning rate: 7.607E-06 | global batch size:    16 | lm loss: 6.916616E+00 | loss scale: 32768.0 | grad norm: 127407.249 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1716/  159576 | consumed samples:        27456 | elapsed time per iteration (ms): 13904.0 | learning rate: 7.611E-06 | global batch size:    16 | lm loss: 7.066643E+00 | loss scale: 32768.0 | grad norm: 371440.502 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1717/  159576 | consumed samples:        27472 | elapsed time per iteration (ms): 13717.4 | learning rate: 7.615E-06 | global batch size:    16 | lm loss: 7.332389E+00 | loss scale: 32768.0 | grad norm: 403592.093 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1718/  159576 | consumed samples:        27488 | elapsed time per iteration (ms): 13591.7 | learning rate: 7.620E-06 | global batch size:    16 | lm loss: 7.055027E+00 | loss scale: 32768.0 | grad norm: 200151.647 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1719/  159576 | consumed samples:        27504 | elapsed time per iteration (ms): 13560.8 | learning rate: 7.624E-06 | global batch size:    16 | lm loss: 7.176567E+00 | loss scale: 32768.0 | grad norm: 144423.577 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1720/  159576 | consumed samples:        27520 | elapsed time per iteration (ms): 13600.7 | learning rate: 7.629E-06 | global batch size:    16 | lm loss: 6.984463E+00 | loss scale: 32768.0 | grad norm: 303766.844 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1721/  159576 | consumed samples:        27536 | elapsed time per iteration (ms): 13892.8 | learning rate: 7.633E-06 | global batch size:    16 | lm loss: 6.990324E+00 | loss scale: 32768.0 | grad norm: 154861.936 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1722/  159576 | consumed samples:        27552 | elapsed time per iteration (ms): 13527.0 | learning rate: 7.638E-06 | global batch size:    16 | lm loss: 7.238751E+00 | loss scale: 32768.0 | grad norm: 231731.625 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1723/  159576 | consumed samples:        27568 | elapsed time per iteration (ms): 13536.8 | learning rate: 7.642E-06 | global batch size:    16 | lm loss: 7.130395E+00 | loss scale: 32768.0 | grad norm: 190824.462 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1724/  159576 | consumed samples:        27584 | elapsed time per iteration (ms): 13580.6 | learning rate: 7.646E-06 | global batch size:    16 | lm loss: 7.182058E+00 | loss scale: 32768.0 | grad norm: 266208.840 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1725/  159576 | consumed samples:        27600 | elapsed time per iteration (ms): 13961.0 | learning rate: 7.651E-06 | global batch size:    16 | lm loss: 7.108085E+00 | loss scale: 32768.0 | grad norm: 284420.360 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1726/  159576 | consumed samples:        27616 | elapsed time per iteration (ms): 13537.5 | learning rate: 7.655E-06 | global batch size:    16 | lm loss: 7.049166E+00 | loss scale: 32768.0 | grad norm: 189929.247 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1727/  159576 | consumed samples:        27632 | elapsed time per iteration (ms): 13583.4 | learning rate: 7.660E-06 | global batch size:    16 | lm loss: 7.012967E+00 | loss scale: 32768.0 | grad norm: 174720.301 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1728/  159576 | consumed samples:        27648 | elapsed time per iteration (ms): 13605.5 | learning rate: 7.664E-06 | global batch size:    16 | lm loss: 7.237570E+00 | loss scale: 32768.0 | grad norm: 194798.770 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1729/  159576 | consumed samples:        27664 | elapsed time per iteration (ms): 13552.5 | learning rate: 7.669E-06 | global batch size:    16 | lm loss: 7.138112E+00 | loss scale: 32768.0 | grad norm: 289252.424 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1730/  159576 | consumed samples:        27680 | elapsed time per iteration (ms): 14055.9 | learning rate: 7.673E-06 | global batch size:    16 | lm loss: 7.041800E+00 | loss scale: 32768.0 | grad norm: 190020.342 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1731/  159576 | consumed samples:        27696 | elapsed time per iteration (ms): 13571.4 | learning rate: 7.678E-06 | global batch size:    16 | lm loss: 7.037878E+00 | loss scale: 32768.0 | grad norm: 149538.464 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1732/  159576 | consumed samples:        27712 | elapsed time per iteration (ms): 13585.4 | learning rate: 7.682E-06 | global batch size:    16 | lm loss: 7.179647E+00 | loss scale: 32768.0 | grad norm: 151351.062 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1733/  159576 | consumed samples:        27728 | elapsed time per iteration (ms): 13582.2 | learning rate: 7.686E-06 | global batch size:    16 | lm loss: 7.234662E+00 | loss scale: 32768.0 | grad norm: 317716.715 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1734/  159576 | consumed samples:        27744 | elapsed time per iteration (ms): 14148.8 | learning rate: 7.691E-06 | global batch size:    16 | lm loss: 7.306998E+00 | loss scale: 32768.0 | grad norm: 216190.319 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1735/  159576 | consumed samples:        27760 | elapsed time per iteration (ms): 13664.2 | learning rate: 7.695E-06 | global batch size:    16 | lm loss: 7.130812E+00 | loss scale: 32768.0 | grad norm: 168041.258 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1736/  159576 | consumed samples:        27776 | elapsed time per iteration (ms): 13539.2 | learning rate: 7.700E-06 | global batch size:    16 | lm loss: 7.164721E+00 | loss scale: 32768.0 | grad norm: 189764.472 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1737/  159576 | consumed samples:        27792 | elapsed time per iteration (ms): 13580.1 | learning rate: 7.704E-06 | global batch size:    16 | lm loss: 7.213598E+00 | loss scale: 32768.0 | grad norm: 231432.124 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1738/  159576 | consumed samples:        27808 | elapsed time per iteration (ms): 13874.0 | learning rate: 7.709E-06 | global batch size:    16 | lm loss: 7.064263E+00 | loss scale: 32768.0 | grad norm: 332299.668 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1739/  159576 | consumed samples:        27824 | elapsed time per iteration (ms): 13542.8 | learning rate: 7.713E-06 | global batch size:    16 | lm loss: 7.187717E+00 | loss scale: 32768.0 | grad norm: 159503.470 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1740/  159576 | consumed samples:        27840 | elapsed time per iteration (ms): 13564.1 | learning rate: 7.717E-06 | global batch size:    16 | lm loss: 7.212025E+00 | loss scale: 32768.0 | grad norm: 275497.658 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1741/  159576 | consumed samples:        27856 | elapsed time per iteration (ms): 13584.8 | learning rate: 7.722E-06 | global batch size:    16 | lm loss: 6.960712E+00 | loss scale: 32768.0 | grad norm: 307419.828 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1742/  159576 | consumed samples:        27872 | elapsed time per iteration (ms): 13621.1 | learning rate: 7.726E-06 | global batch size:    16 | lm loss: 7.086576E+00 | loss scale: 32768.0 | grad norm: 156758.997 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1743/  159576 | consumed samples:        27888 | elapsed time per iteration (ms): 13719.9 | learning rate: 7.731E-06 | global batch size:    16 | lm loss: 6.961288E+00 | loss scale: 32768.0 | grad norm: 147761.212 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1744/  159576 | consumed samples:        27904 | elapsed time per iteration (ms): 13570.6 | learning rate: 7.735E-06 | global batch size:    16 | lm loss: 7.320576E+00 | loss scale: 32768.0 | grad norm: 309786.612 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1745/  159576 | consumed samples:        27920 | elapsed time per iteration (ms): 13600.3 | learning rate: 7.740E-06 | global batch size:    16 | lm loss: 7.218632E+00 | loss scale: 32768.0 | grad norm: 330698.583 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1746/  159576 | consumed samples:        27936 | elapsed time per iteration (ms): 13548.3 | learning rate: 7.744E-06 | global batch size:    16 | lm loss: 7.139973E+00 | loss scale: 32768.0 | grad norm: 376967.322 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1747/  159576 | consumed samples:        27952 | elapsed time per iteration (ms): 13954.3 | learning rate: 7.749E-06 | global batch size:    16 | lm loss: 7.074110E+00 | loss scale: 32768.0 | grad norm: 214147.428 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1748/  159576 | consumed samples:        27968 | elapsed time per iteration (ms): 13621.8 | learning rate: 7.753E-06 | global batch size:    16 | lm loss: 7.254288E+00 | loss scale: 32768.0 | grad norm: 128937.522 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1749/  159576 | consumed samples:        27984 | elapsed time per iteration (ms): 13626.6 | learning rate: 7.757E-06 | global batch size:    16 | lm loss: 7.009082E+00 | loss scale: 32768.0 | grad norm: 392446.478 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1750/  159576 | consumed samples:        28000 | elapsed time per iteration (ms): 13590.6 | learning rate: 7.762E-06 | global batch size:    16 | lm loss: 6.949193E+00 | loss scale: 32768.0 | grad norm: 205911.332 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1751/  159576 | consumed samples:        28016 | elapsed time per iteration (ms): 13916.9 | learning rate: 7.766E-06 | global batch size:    16 | lm loss: 7.175614E+00 | loss scale: 32768.0 | grad norm: 181359.266 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1752/  159576 | consumed samples:        28032 | elapsed time per iteration (ms): 13747.5 | learning rate: 7.771E-06 | global batch size:    16 | lm loss: 7.084972E+00 | loss scale: 32768.0 | grad norm: 191810.333 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1753/  159576 | consumed samples:        28048 | elapsed time per iteration (ms): 13591.1 | learning rate: 7.775E-06 | global batch size:    16 | lm loss: 7.125815E+00 | loss scale: 32768.0 | grad norm: 150833.632 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1754/  159576 | consumed samples:        28064 | elapsed time per iteration (ms): 13552.4 | learning rate: 7.780E-06 | global batch size:    16 | lm loss: 7.096021E+00 | loss scale: 32768.0 | grad norm: 858159.626 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1755/  159576 | consumed samples:        28080 | elapsed time per iteration (ms): 13586.8 | learning rate: 7.784E-06 | global batch size:    16 | lm loss: 7.401230E+00 | loss scale: 32768.0 | grad norm: 1015122.062 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1756/  159576 | consumed samples:        28096 | elapsed time per iteration (ms): 14062.7 | learning rate: 7.788E-06 | global batch size:    16 | lm loss: 7.141807E+00 | loss scale: 32768.0 | grad norm: 241473.375 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1757/  159576 | consumed samples:        28112 | elapsed time per iteration (ms): 13654.9 | learning rate: 7.793E-06 | global batch size:    16 | lm loss: 7.055682E+00 | loss scale: 32768.0 | grad norm: 195258.121 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1758/  159576 | consumed samples:        28128 | elapsed time per iteration (ms): 13576.6 | learning rate: 7.797E-06 | global batch size:    16 | lm loss: 6.887124E+00 | loss scale: 32768.0 | grad norm: 209948.309 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1759/  159576 | consumed samples:        28144 | elapsed time per iteration (ms): 13615.8 | learning rate: 7.802E-06 | global batch size:    16 | lm loss: 7.008955E+00 | loss scale: 32768.0 | grad norm: 218109.807 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1760/  159576 | consumed samples:        28160 | elapsed time per iteration (ms): 13880.5 | learning rate: 7.806E-06 | global batch size:    16 | lm loss: 7.156555E+00 | loss scale: 32768.0 | grad norm: 199049.119 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1761/  159576 | consumed samples:        28176 | elapsed time per iteration (ms): 13559.3 | learning rate: 7.811E-06 | global batch size:    16 | lm loss: 7.445184E+00 | loss scale: 32768.0 | grad norm: 571721.433 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1762/  159576 | consumed samples:        28192 | elapsed time per iteration (ms): 13597.9 | learning rate: 7.815E-06 | global batch size:    16 | lm loss: 7.408930E+00 | loss scale: 32768.0 | grad norm: 477324.031 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1763/  159576 | consumed samples:        28208 | elapsed time per iteration (ms): 13646.1 | learning rate: 7.820E-06 | global batch size:    16 | lm loss: 7.228862E+00 | loss scale: 32768.0 | grad norm: 183806.995 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1764/  159576 | consumed samples:        28224 | elapsed time per iteration (ms): 13595.0 | learning rate: 7.824E-06 | global batch size:    16 | lm loss: 7.213759E+00 | loss scale: 32768.0 | grad norm: 199120.863 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1765/  159576 | consumed samples:        28240 | elapsed time per iteration (ms): 13787.5 | learning rate: 7.828E-06 | global batch size:    16 | lm loss: 7.190694E+00 | loss scale: 32768.0 | grad norm: 230903.246 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1766/  159576 | consumed samples:        28256 | elapsed time per iteration (ms): 13655.5 | learning rate: 7.833E-06 | global batch size:    16 | lm loss: 7.120300E+00 | loss scale: 32768.0 | grad norm: 189933.714 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1767/  159576 | consumed samples:        28272 | elapsed time per iteration (ms): 13622.2 | learning rate: 7.837E-06 | global batch size:    16 | lm loss: 7.216377E+00 | loss scale: 32768.0 | grad norm: 411494.652 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1768/  159576 | consumed samples:        28288 | elapsed time per iteration (ms): 13542.4 | learning rate: 7.842E-06 | global batch size:    16 | lm loss: 7.296084E+00 | loss scale: 32768.0 | grad norm: 211545.292 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1769/  159576 | consumed samples:        28304 | elapsed time per iteration (ms): 13835.8 | learning rate: 7.846E-06 | global batch size:    16 | lm loss: 7.014185E+00 | loss scale: 32768.0 | grad norm: 159627.354 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1770/  159576 | consumed samples:        28320 | elapsed time per iteration (ms): 13625.8 | learning rate: 7.851E-06 | global batch size:    16 | lm loss: 7.260840E+00 | loss scale: 32768.0 | grad norm: 230760.798 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-24 09:03:50] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1165978_[1-10%1] on 'gpu_p13' partition)
[2021-09-24 09:03:50] PULSE: tr8-104B is running for 3:11:39 since 2021-09-24T05:52:11 (1162855_1 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8])
 iteration     1771/  159576 | consumed samples:        28336 | elapsed time per iteration (ms): 13609.6 | learning rate: 7.855E-06 | global batch size:    16 | lm loss: 7.096549E+00 | loss scale: 32768.0 | grad norm: 208126.291 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1772/  159576 | consumed samples:        28352 | elapsed time per iteration (ms): 13612.5 | learning rate: 7.859E-06 | global batch size:    16 | lm loss: 7.288601E+00 | loss scale: 32768.0 | grad norm: 299861.795 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1773/  159576 | consumed samples:        28368 | elapsed time per iteration (ms): 14036.1 | learning rate: 7.864E-06 | global batch size:    16 | lm loss: 7.006525E+00 | loss scale: 32768.0 | grad norm: 221185.737 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1774/  159576 | consumed samples:        28384 | elapsed time per iteration (ms): 13455.1 | learning rate: 7.868E-06 | global batch size:    16 | lm loss: 7.057816E+00 | loss scale: 32768.0 | grad norm: 211669.427 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1775/  159576 | consumed samples:        28400 | elapsed time per iteration (ms): 13580.5 | learning rate: 7.873E-06 | global batch size:    16 | lm loss: 7.225205E+00 | loss scale: 32768.0 | grad norm: 232985.961 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1776/  159576 | consumed samples:        28416 | elapsed time per iteration (ms): 13577.7 | learning rate: 7.877E-06 | global batch size:    16 | lm loss: 7.090505E+00 | loss scale: 32768.0 | grad norm: 148862.985 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1777/  159576 | consumed samples:        28432 | elapsed time per iteration (ms): 13633.9 | learning rate: 7.882E-06 | global batch size:    16 | lm loss: 7.291343E+00 | loss scale: 32768.0 | grad norm: 241931.207 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1778/  159576 | consumed samples:        28448 | elapsed time per iteration (ms): 13810.9 | learning rate: 7.886E-06 | global batch size:    16 | lm loss: 7.168088E+00 | loss scale: 32768.0 | grad norm: 186155.211 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1779/  159576 | consumed samples:        28464 | elapsed time per iteration (ms): 13677.6 | learning rate: 7.891E-06 | global batch size:    16 | lm loss: 6.975587E+00 | loss scale: 32768.0 | grad norm: 141385.386 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1780/  159576 | consumed samples:        28480 | elapsed time per iteration (ms): 13699.5 | learning rate: 7.895E-06 | global batch size:    16 | lm loss: 7.234455E+00 | loss scale: 32768.0 | grad norm: 167275.043 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1781/  159576 | consumed samples:        28496 | elapsed time per iteration (ms): 13560.1 | learning rate: 7.899E-06 | global batch size:    16 | lm loss: 7.118816E+00 | loss scale: 32768.0 | grad norm: 185745.557 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1782/  159576 | consumed samples:        28512 | elapsed time per iteration (ms): 14007.0 | learning rate: 7.904E-06 | global batch size:    16 | lm loss: 7.325441E+00 | loss scale: 32768.0 | grad norm: 151237.535 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1783/  159576 | consumed samples:        28528 | elapsed time per iteration (ms): 13468.4 | learning rate: 7.908E-06 | global batch size:    16 | lm loss: 6.976577E+00 | loss scale: 32768.0 | grad norm: 157950.458 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1784/  159576 | consumed samples:        28544 | elapsed time per iteration (ms): 13610.8 | learning rate: 7.913E-06 | global batch size:    16 | lm loss: 7.151215E+00 | loss scale: 32768.0 | grad norm: 185745.960 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1785/  159576 | consumed samples:        28560 | elapsed time per iteration (ms): 13574.9 | learning rate: 7.917E-06 | global batch size:    16 | lm loss: 6.982706E+00 | loss scale: 32768.0 | grad norm: 212394.757 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1786/  159576 | consumed samples:        28576 | elapsed time per iteration (ms): 13593.1 | learning rate: 7.922E-06 | global batch size:    16 | lm loss: 7.090255E+00 | loss scale: 32768.0 | grad norm: 165476.788 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1787/  159576 | consumed samples:        28592 | elapsed time per iteration (ms): 13825.7 | learning rate: 7.926E-06 | global batch size:    16 | lm loss: 7.190539E+00 | loss scale: 32768.0 | grad norm: 105058.438 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1788/  159576 | consumed samples:        28608 | elapsed time per iteration (ms): 13613.9 | learning rate: 7.930E-06 | global batch size:    16 | lm loss: 6.849520E+00 | loss scale: 32768.0 | grad norm: 180790.521 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1789/  159576 | consumed samples:        28624 | elapsed time per iteration (ms): 13633.8 | learning rate: 7.935E-06 | global batch size:    16 | lm loss: 7.203046E+00 | loss scale: 32768.0 | grad norm: 126112.335 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1790/  159576 | consumed samples:        28640 | elapsed time per iteration (ms): 13618.2 | learning rate: 7.939E-06 | global batch size:    16 | lm loss: 7.073618E+00 | loss scale: 32768.0 | grad norm: 138120.801 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1791/  159576 | consumed samples:        28656 | elapsed time per iteration (ms): 14044.8 | learning rate: 7.944E-06 | global batch size:    16 | lm loss: 7.193256E+00 | loss scale: 32768.0 | grad norm: 127392.206 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1792/  159576 | consumed samples:        28672 | elapsed time per iteration (ms): 13675.9 | learning rate: 7.948E-06 | global batch size:    16 | lm loss: 7.182660E+00 | loss scale: 32768.0 | grad norm: 128828.190 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1793/  159576 | consumed samples:        28688 | elapsed time per iteration (ms): 13639.0 | learning rate: 7.953E-06 | global batch size:    16 | lm loss: 7.029709E+00 | loss scale: 32768.0 | grad norm: 123453.201 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1794/  159576 | consumed samples:        28704 | elapsed time per iteration (ms): 13728.8 | learning rate: 7.957E-06 | global batch size:    16 | lm loss: 7.166730E+00 | loss scale: 32768.0 | grad norm: 117050.511 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1795/  159576 | consumed samples:        28720 | elapsed time per iteration (ms): 13951.0 | learning rate: 7.962E-06 | global batch size:    16 | lm loss: 7.100776E+00 | loss scale: 32768.0 | grad norm: 166379.571 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1796/  159576 | consumed samples:        28736 | elapsed time per iteration (ms): 13626.1 | learning rate: 7.966E-06 | global batch size:    16 | lm loss: 7.059687E+00 | loss scale: 32768.0 | grad norm: 165877.869 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1797/  159576 | consumed samples:        28752 | elapsed time per iteration (ms): 13658.2 | learning rate: 7.970E-06 | global batch size:    16 | lm loss: 7.128800E+00 | loss scale: 32768.0 | grad norm: 241870.659 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1798/  159576 | consumed samples:        28768 | elapsed time per iteration (ms): 13547.6 | learning rate: 7.975E-06 | global batch size:    16 | lm loss: 6.884446E+00 | loss scale: 32768.0 | grad norm: 129845.941 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1799/  159576 | consumed samples:        28784 | elapsed time per iteration (ms): 13614.6 | learning rate: 7.979E-06 | global batch size:    16 | lm loss: 7.309677E+00 | loss scale: 32768.0 | grad norm: 156206.470 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1800/  159576 | consumed samples:        28800 | elapsed time per iteration (ms): 13719.1 | learning rate: 7.984E-06 | global batch size:    16 | lm loss: 6.891129E+00 | loss scale: 32768.0 | grad norm: 130612.475 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1801/  159576 | consumed samples:        28816 | elapsed time per iteration (ms): 13709.3 | learning rate: 7.988E-06 | global batch size:    16 | lm loss: 7.259354E+00 | loss scale: 32768.0 | grad norm: 299631.068 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1802/  159576 | consumed samples:        28832 | elapsed time per iteration (ms): 13702.3 | learning rate: 7.993E-06 | global batch size:    16 | lm loss: 7.091782E+00 | loss scale: 32768.0 | grad norm: 164547.713 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1803/  159576 | consumed samples:        28848 | elapsed time per iteration (ms): 13667.9 | learning rate: 7.997E-06 | global batch size:    16 | lm loss: 7.081347E+00 | loss scale: 32768.0 | grad norm: 157884.119 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1804/  159576 | consumed samples:        28864 | elapsed time per iteration (ms): 14087.7 | learning rate: 8.001E-06 | global batch size:    16 | lm loss: 7.043708E+00 | loss scale: 32768.0 | grad norm: 179047.535 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1805/  159576 | consumed samples:        28880 | elapsed time per iteration (ms): 13636.0 | learning rate: 8.006E-06 | global batch size:    16 | lm loss: 7.153672E+00 | loss scale: 32768.0 | grad norm: 171473.191 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1806/  159576 | consumed samples:        28896 | elapsed time per iteration (ms): 13563.1 | learning rate: 8.010E-06 | global batch size:    16 | lm loss: 7.067021E+00 | loss scale: 32768.0 | grad norm: 114434.243 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1807/  159576 | consumed samples:        28912 | elapsed time per iteration (ms): 13653.6 | learning rate: 8.015E-06 | global batch size:    16 | lm loss: 7.234491E+00 | loss scale: 32768.0 | grad norm: 149275.670 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1808/  159576 | consumed samples:        28928 | elapsed time per iteration (ms): 13997.0 | learning rate: 8.019E-06 | global batch size:    16 | lm loss: 7.015783E+00 | loss scale: 32768.0 | grad norm: 179254.375 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1809/  159576 | consumed samples:        28944 | elapsed time per iteration (ms): 13813.5 | learning rate: 8.024E-06 | global batch size:    16 | lm loss: 7.176732E+00 | loss scale: 32768.0 | grad norm: 180477.986 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1810/  159576 | consumed samples:        28960 | elapsed time per iteration (ms): 13672.4 | learning rate: 8.028E-06 | global batch size:    16 | lm loss: 6.590204E+00 | loss scale: 32768.0 | grad norm: 149127.876 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1811/  159576 | consumed samples:        28976 | elapsed time per iteration (ms): 13741.3 | learning rate: 8.033E-06 | global batch size:    16 | lm loss: 7.100949E+00 | loss scale: 32768.0 | grad norm: 133004.506 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1812/  159576 | consumed samples:        28992 | elapsed time per iteration (ms): 13598.0 | learning rate: 8.037E-06 | global batch size:    16 | lm loss: 7.268322E+00 | loss scale: 32768.0 | grad norm: 287887.492 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1813/  159576 | consumed samples:        29008 | elapsed time per iteration (ms): 13826.0 | learning rate: 8.041E-06 | global batch size:    16 | lm loss: 7.048282E+00 | loss scale: 32768.0 | grad norm: 147045.336 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1814/  159576 | consumed samples:        29024 | elapsed time per iteration (ms): 13651.5 | learning rate: 8.046E-06 | global batch size:    16 | lm loss: 7.168237E+00 | loss scale: 32768.0 | grad norm: 167345.880 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1815/  159576 | consumed samples:        29040 | elapsed time per iteration (ms): 13646.2 | learning rate: 8.050E-06 | global batch size:    16 | lm loss: 6.976926E+00 | loss scale: 32768.0 | grad norm: 173193.629 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1816/  159576 | consumed samples:        29056 | elapsed time per iteration (ms): 13708.4 | learning rate: 8.055E-06 | global batch size:    16 | lm loss: 7.173286E+00 | loss scale: 32768.0 | grad norm: 156812.836 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1817/  159576 | consumed samples:        29072 | elapsed time per iteration (ms): 14056.6 | learning rate: 8.059E-06 | global batch size:    16 | lm loss: 7.191895E+00 | loss scale: 32768.0 | grad norm: 254989.804 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1818/  159576 | consumed samples:        29088 | elapsed time per iteration (ms): 13727.1 | learning rate: 8.064E-06 | global batch size:    16 | lm loss: 7.070405E+00 | loss scale: 32768.0 | grad norm: 128138.350 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1819/  159576 | consumed samples:        29104 | elapsed time per iteration (ms): 13606.2 | learning rate: 8.068E-06 | global batch size:    16 | lm loss: 6.955974E+00 | loss scale: 32768.0 | grad norm: 140247.528 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1820/  159576 | consumed samples:        29120 | elapsed time per iteration (ms): 13652.5 | learning rate: 8.072E-06 | global batch size:    16 | lm loss: 7.029711E+00 | loss scale: 32768.0 | grad norm: 153040.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1821/  159576 | consumed samples:        29136 | elapsed time per iteration (ms): 13671.5 | learning rate: 8.077E-06 | global batch size:    16 | lm loss: 7.097312E+00 | loss scale: 32768.0 | grad norm: 168364.904 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1822/  159576 | consumed samples:        29152 | elapsed time per iteration (ms): 13964.1 | learning rate: 8.081E-06 | global batch size:    16 | lm loss: 7.163728E+00 | loss scale: 32768.0 | grad norm: 143592.573 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1823/  159576 | consumed samples:        29168 | elapsed time per iteration (ms): 13677.5 | learning rate: 8.086E-06 | global batch size:    16 | lm loss: 7.161910E+00 | loss scale: 32768.0 | grad norm: 232336.600 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1824/  159576 | consumed samples:        29184 | elapsed time per iteration (ms): 13682.4 | learning rate: 8.090E-06 | global batch size:    16 | lm loss: 7.241871E+00 | loss scale: 32768.0 | grad norm: 136988.706 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1825/  159576 | consumed samples:        29200 | elapsed time per iteration (ms): 13681.2 | learning rate: 8.095E-06 | global batch size:    16 | lm loss: 6.885506E+00 | loss scale: 32768.0 | grad norm: 147212.456 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1826/  159576 | consumed samples:        29216 | elapsed time per iteration (ms): 14107.7 | learning rate: 8.099E-06 | global batch size:    16 | lm loss: 7.094235E+00 | loss scale: 32768.0 | grad norm: 210358.599 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1827/  159576 | consumed samples:        29232 | elapsed time per iteration (ms): 13698.2 | learning rate: 8.104E-06 | global batch size:    16 | lm loss: 6.987474E+00 | loss scale: 32768.0 | grad norm: 200444.612 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1828/  159576 | consumed samples:        29248 | elapsed time per iteration (ms): 13646.3 | learning rate: 8.108E-06 | global batch size:    16 | lm loss: 7.024292E+00 | loss scale: 32768.0 | grad norm: 144708.093 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1829/  159576 | consumed samples:        29264 | elapsed time per iteration (ms): 13672.0 | learning rate: 8.112E-06 | global batch size:    16 | lm loss: 7.101940E+00 | loss scale: 32768.0 | grad norm: 137983.288 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1830/  159576 | consumed samples:        29280 | elapsed time per iteration (ms): 13973.1 | learning rate: 8.117E-06 | global batch size:    16 | lm loss: 6.950300E+00 | loss scale: 32768.0 | grad norm: 228570.073 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1831/  159576 | consumed samples:        29296 | elapsed time per iteration (ms): 13712.1 | learning rate: 8.121E-06 | global batch size:    16 | lm loss: 7.000825E+00 | loss scale: 32768.0 | grad norm: 204009.839 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1832/  159576 | consumed samples:        29312 | elapsed time per iteration (ms): 13734.6 | learning rate: 8.126E-06 | global batch size:    16 | lm loss: 7.021888E+00 | loss scale: 32768.0 | grad norm: 168698.722 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1833/  159576 | consumed samples:        29328 | elapsed time per iteration (ms): 13643.1 | learning rate: 8.130E-06 | global batch size:    16 | lm loss: 6.956877E+00 | loss scale: 32768.0 | grad norm: 139702.257 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1834/  159576 | consumed samples:        29344 | elapsed time per iteration (ms): 13670.0 | learning rate: 8.135E-06 | global batch size:    16 | lm loss: 7.078534E+00 | loss scale: 32768.0 | grad norm: 220188.892 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1835/  159576 | consumed samples:        29360 | elapsed time per iteration (ms): 13786.5 | learning rate: 8.139E-06 | global batch size:    16 | lm loss: 7.145173E+00 | loss scale: 32768.0 | grad norm: 181620.360 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1836/  159576 | consumed samples:        29376 | elapsed time per iteration (ms): 13684.7 | learning rate: 8.143E-06 | global batch size:    16 | lm loss: 7.147571E+00 | loss scale: 32768.0 | grad norm: 148241.508 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1837/  159576 | consumed samples:        29392 | elapsed time per iteration (ms): 13650.8 | learning rate: 8.148E-06 | global batch size:    16 | lm loss: 7.198610E+00 | loss scale: 32768.0 | grad norm: 129198.374 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1838/  159576 | consumed samples:        29408 | elapsed time per iteration (ms): 13689.6 | learning rate: 8.152E-06 | global batch size:    16 | lm loss: 7.077027E+00 | loss scale: 32768.0 | grad norm: 179805.881 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1839/  159576 | consumed samples:        29424 | elapsed time per iteration (ms): 14193.0 | learning rate: 8.157E-06 | global batch size:    16 | lm loss: 7.034157E+00 | loss scale: 32768.0 | grad norm: 179474.021 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1840/  159576 | consumed samples:        29440 | elapsed time per iteration (ms): 13593.3 | learning rate: 8.161E-06 | global batch size:    16 | lm loss: 7.132106E+00 | loss scale: 32768.0 | grad norm: 138966.354 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1841/  159576 | consumed samples:        29456 | elapsed time per iteration (ms): 13717.8 | learning rate: 8.166E-06 | global batch size:    16 | lm loss: 7.290091E+00 | loss scale: 32768.0 | grad norm: 176321.035 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1842/  159576 | consumed samples:        29472 | elapsed time per iteration (ms): 13672.3 | learning rate: 8.170E-06 | global batch size:    16 | lm loss: 7.222583E+00 | loss scale: 32768.0 | grad norm: 157190.685 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1843/  159576 | consumed samples:        29488 | elapsed time per iteration (ms): 14041.0 | learning rate: 8.175E-06 | global batch size:    16 | lm loss: 7.080160E+00 | loss scale: 32768.0 | grad norm: 209951.002 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1844/  159576 | consumed samples:        29504 | elapsed time per iteration (ms): 13687.6 | learning rate: 8.179E-06 | global batch size:    16 | lm loss: 7.044501E+00 | loss scale: 32768.0 | grad norm: 148871.965 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1845/  159576 | consumed samples:        29520 | elapsed time per iteration (ms): 13645.6 | learning rate: 8.183E-06 | global batch size:    16 | lm loss: 7.157808E+00 | loss scale: 32768.0 | grad norm: 274735.365 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1846/  159576 | consumed samples:        29536 | elapsed time per iteration (ms): 13730.4 | learning rate: 8.188E-06 | global batch size:    16 | lm loss: 6.885038E+00 | loss scale: 32768.0 | grad norm: 152141.636 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1847/  159576 | consumed samples:        29552 | elapsed time per iteration (ms): 13619.7 | learning rate: 8.192E-06 | global batch size:    16 | lm loss: 7.235194E+00 | loss scale: 32768.0 | grad norm: 176093.463 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1848/  159576 | consumed samples:        29568 | elapsed time per iteration (ms): 13886.2 | learning rate: 8.197E-06 | global batch size:    16 | lm loss: 7.254928E+00 | loss scale: 32768.0 | grad norm: 205754.293 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1849/  159576 | consumed samples:        29584 | elapsed time per iteration (ms): 13743.9 | learning rate: 8.201E-06 | global batch size:    16 | lm loss: 7.040710E+00 | loss scale: 32768.0 | grad norm: 218799.146 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1850/  159576 | consumed samples:        29600 | elapsed time per iteration (ms): 13589.2 | learning rate: 8.206E-06 | global batch size:    16 | lm loss: 7.048983E+00 | loss scale: 32768.0 | grad norm: 207680.104 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1851/  159576 | consumed samples:        29616 | elapsed time per iteration (ms): 13643.5 | learning rate: 8.210E-06 | global batch size:    16 | lm loss: 7.264068E+00 | loss scale: 32768.0 | grad norm: 172145.935 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1852/  159576 | consumed samples:        29632 | elapsed time per iteration (ms): 14007.8 | learning rate: 8.214E-06 | global batch size:    16 | lm loss: 7.091225E+00 | loss scale: 32768.0 | grad norm: 165885.271 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1853/  159576 | consumed samples:        29648 | elapsed time per iteration (ms): 13621.7 | learning rate: 8.219E-06 | global batch size:    16 | lm loss: 7.004953E+00 | loss scale: 32768.0 | grad norm: 193763.726 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1854/  159576 | consumed samples:        29664 | elapsed time per iteration (ms): 13705.7 | learning rate: 8.223E-06 | global batch size:    16 | lm loss: 7.337306E+00 | loss scale: 32768.0 | grad norm: 334165.602 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1855/  159576 | consumed samples:        29680 | elapsed time per iteration (ms): 13688.7 | learning rate: 8.228E-06 | global batch size:    16 | lm loss: 7.088278E+00 | loss scale: 32768.0 | grad norm: 168305.003 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1856/  159576 | consumed samples:        29696 | elapsed time per iteration (ms): 14064.4 | learning rate: 8.232E-06 | global batch size:    16 | lm loss: 7.075657E+00 | loss scale: 32768.0 | grad norm: 146104.687 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1857/  159576 | consumed samples:        29712 | elapsed time per iteration (ms): 13622.8 | learning rate: 8.237E-06 | global batch size:    16 | lm loss: 7.326543E+00 | loss scale: 32768.0 | grad norm: 226986.122 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1858/  159576 | consumed samples:        29728 | elapsed time per iteration (ms): 13661.1 | learning rate: 8.241E-06 | global batch size:    16 | lm loss: 7.226311E+00 | loss scale: 32768.0 | grad norm: 127252.080 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1859/  159576 | consumed samples:        29744 | elapsed time per iteration (ms): 13672.4 | learning rate: 8.246E-06 | global batch size:    16 | lm loss: 7.024733E+00 | loss scale: 32768.0 | grad norm: 195136.100 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1860/  159576 | consumed samples:        29760 | elapsed time per iteration (ms): 13685.6 | learning rate: 8.250E-06 | global batch size:    16 | lm loss: 7.050764E+00 | loss scale: 32768.0 | grad norm: 137697.941 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1861/  159576 | consumed samples:        29776 | elapsed time per iteration (ms): 13956.5 | learning rate: 8.254E-06 | global batch size:    16 | lm loss: 7.164598E+00 | loss scale: 32768.0 | grad norm: 186285.211 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1862/  159576 | consumed samples:        29792 | elapsed time per iteration (ms): 13801.6 | learning rate: 8.259E-06 | global batch size:    16 | lm loss: 6.982927E+00 | loss scale: 32768.0 | grad norm: 155576.175 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1863/  159576 | consumed samples:        29808 | elapsed time per iteration (ms): 13779.0 | learning rate: 8.263E-06 | global batch size:    16 | lm loss: 6.845668E+00 | loss scale: 32768.0 | grad norm: 211290.875 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1864/  159576 | consumed samples:        29824 | elapsed time per iteration (ms): 13629.6 | learning rate: 8.268E-06 | global batch size:    16 | lm loss: 7.561100E+00 | loss scale: 32768.0 | grad norm: 177907.854 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1865/  159576 | consumed samples:        29840 | elapsed time per iteration (ms): 14024.6 | learning rate: 8.272E-06 | global batch size:    16 | lm loss: 7.056180E+00 | loss scale: 32768.0 | grad norm: 132307.729 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1866/  159576 | consumed samples:        29856 | elapsed time per iteration (ms): 13629.1 | learning rate: 8.277E-06 | global batch size:    16 | lm loss: 7.005206E+00 | loss scale: 32768.0 | grad norm: 140727.432 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1867/  159576 | consumed samples:        29872 | elapsed time per iteration (ms): 13680.5 | learning rate: 8.281E-06 | global batch size:    16 | lm loss: 7.008940E+00 | loss scale: 32768.0 | grad norm: 149676.751 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1868/  159576 | consumed samples:        29888 | elapsed time per iteration (ms): 13661.9 | learning rate: 8.286E-06 | global batch size:    16 | lm loss: 7.154263E+00 | loss scale: 32768.0 | grad norm: 181537.747 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1869/  159576 | consumed samples:        29904 | elapsed time per iteration (ms): 13705.9 | learning rate: 8.290E-06 | global batch size:    16 | lm loss: 7.144859E+00 | loss scale: 32768.0 | grad norm: 156740.122 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1870/  159576 | consumed samples:        29920 | elapsed time per iteration (ms): 13994.0 | learning rate: 8.294E-06 | global batch size:    16 | lm loss: 7.053184E+00 | loss scale: 32768.0 | grad norm: 209836.615 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1871/  159576 | consumed samples:        29936 | elapsed time per iteration (ms): 13623.9 | learning rate: 8.299E-06 | global batch size:    16 | lm loss: 7.033763E+00 | loss scale: 32768.0 | grad norm: 173327.127 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1872/  159576 | consumed samples:        29952 | elapsed time per iteration (ms): 13679.1 | learning rate: 8.303E-06 | global batch size:    16 | lm loss: 6.990786E+00 | loss scale: 32768.0 | grad norm: 281336.242 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1873/  159576 | consumed samples:        29968 | elapsed time per iteration (ms): 13694.2 | learning rate: 8.308E-06 | global batch size:    16 | lm loss: 7.073781E+00 | loss scale: 32768.0 | grad norm: 124900.526 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1874/  159576 | consumed samples:        29984 | elapsed time per iteration (ms): 13905.9 | learning rate: 8.312E-06 | global batch size:    16 | lm loss: 7.112270E+00 | loss scale: 32768.0 | grad norm: 168221.159 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1875/  159576 | consumed samples:        30000 | elapsed time per iteration (ms): 13703.7 | learning rate: 8.317E-06 | global batch size:    16 | lm loss: 7.233196E+00 | loss scale: 32768.0 | grad norm: 174650.162 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1876/  159576 | consumed samples:        30016 | elapsed time per iteration (ms): 13702.9 | learning rate: 8.321E-06 | global batch size:    16 | lm loss: 6.967190E+00 | loss scale: 32768.0 | grad norm: 177533.380 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1877/  159576 | consumed samples:        30032 | elapsed time per iteration (ms): 13717.8 | learning rate: 8.325E-06 | global batch size:    16 | lm loss: 7.208225E+00 | loss scale: 32768.0 | grad norm: 207887.332 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1878/  159576 | consumed samples:        30048 | elapsed time per iteration (ms): 14066.9 | learning rate: 8.330E-06 | global batch size:    16 | lm loss: 7.077339E+00 | loss scale: 32768.0 | grad norm: 142338.907 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1879/  159576 | consumed samples:        30064 | elapsed time per iteration (ms): 13776.6 | learning rate: 8.334E-06 | global batch size:    16 | lm loss: 7.113251E+00 | loss scale: 32768.0 | grad norm: 158300.777 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1880/  159576 | consumed samples:        30080 | elapsed time per iteration (ms): 13663.2 | learning rate: 8.339E-06 | global batch size:    16 | lm loss: 6.912469E+00 | loss scale: 32768.0 | grad norm: 145353.873 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1881/  159576 | consumed samples:        30096 | elapsed time per iteration (ms): 13679.1 | learning rate: 8.343E-06 | global batch size:    16 | lm loss: 7.055939E+00 | loss scale: 32768.0 | grad norm: 337973.880 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1882/  159576 | consumed samples:        30112 | elapsed time per iteration (ms): 13654.4 | learning rate: 8.348E-06 | global batch size:    16 | lm loss: 6.903512E+00 | loss scale: 32768.0 | grad norm: 240165.485 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1883/  159576 | consumed samples:        30128 | elapsed time per iteration (ms): 13896.8 | learning rate: 8.352E-06 | global batch size:    16 | lm loss: 7.154733E+00 | loss scale: 32768.0 | grad norm: 145006.968 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1884/  159576 | consumed samples:        30144 | elapsed time per iteration (ms): 13729.5 | learning rate: 8.357E-06 | global batch size:    16 | lm loss: 7.018287E+00 | loss scale: 32768.0 | grad norm: 447058.582 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1885/  159576 | consumed samples:        30160 | elapsed time per iteration (ms): 13624.7 | learning rate: 8.361E-06 | global batch size:    16 | lm loss: 7.306771E+00 | loss scale: 32768.0 | grad norm: 269279.686 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1886/  159576 | consumed samples:        30176 | elapsed time per iteration (ms): 13710.2 | learning rate: 8.365E-06 | global batch size:    16 | lm loss: 7.124641E+00 | loss scale: 32768.0 | grad norm: 184189.442 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1887/  159576 | consumed samples:        30192 | elapsed time per iteration (ms): 14269.7 | learning rate: 8.370E-06 | global batch size:    16 | lm loss: 7.147641E+00 | loss scale: 32768.0 | grad norm: 240777.486 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1888/  159576 | consumed samples:        30208 | elapsed time per iteration (ms): 13668.8 | learning rate: 8.374E-06 | global batch size:    16 | lm loss: 7.246544E+00 | loss scale: 32768.0 | grad norm: 221768.309 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1889/  159576 | consumed samples:        30224 | elapsed time per iteration (ms): 13682.0 | learning rate: 8.379E-06 | global batch size:    16 | lm loss: 7.042133E+00 | loss scale: 32768.0 | grad norm: 453492.686 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1890/  159576 | consumed samples:        30240 | elapsed time per iteration (ms): 13683.0 | learning rate: 8.383E-06 | global batch size:    16 | lm loss: 7.161106E+00 | loss scale: 32768.0 | grad norm: 191134.227 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1891/  159576 | consumed samples:        30256 | elapsed time per iteration (ms): 14045.3 | learning rate: 8.388E-06 | global batch size:    16 | lm loss: 7.080533E+00 | loss scale: 32768.0 | grad norm: 226207.626 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1892/  159576 | consumed samples:        30272 | elapsed time per iteration (ms): 13740.4 | learning rate: 8.392E-06 | global batch size:    16 | lm loss: 6.948812E+00 | loss scale: 32768.0 | grad norm: 198329.312 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1893/  159576 | consumed samples:        30288 | elapsed time per iteration (ms): 13747.4 | learning rate: 8.396E-06 | global batch size:    16 | lm loss: 7.024124E+00 | loss scale: 32768.0 | grad norm: 332574.173 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1894/  159576 | consumed samples:        30304 | elapsed time per iteration (ms): 13742.5 | learning rate: 8.401E-06 | global batch size:    16 | lm loss: 7.072248E+00 | loss scale: 32768.0 | grad norm: 351090.950 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1895/  159576 | consumed samples:        30320 | elapsed time per iteration (ms): 13599.9 | learning rate: 8.405E-06 | global batch size:    16 | lm loss: 6.964484E+00 | loss scale: 32768.0 | grad norm: 180676.580 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1896/  159576 | consumed samples:        30336 | elapsed time per iteration (ms): 13892.1 | learning rate: 8.410E-06 | global batch size:    16 | lm loss: 7.066601E+00 | loss scale: 32768.0 | grad norm: 186229.787 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1897/  159576 | consumed samples:        30352 | elapsed time per iteration (ms): 13686.6 | learning rate: 8.414E-06 | global batch size:    16 | lm loss: 6.975677E+00 | loss scale: 32768.0 | grad norm: 145844.159 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1898/  159576 | consumed samples:        30368 | elapsed time per iteration (ms): 13668.1 | learning rate: 8.419E-06 | global batch size:    16 | lm loss: 7.225606E+00 | loss scale: 32768.0 | grad norm: 229819.640 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1899/  159576 | consumed samples:        30384 | elapsed time per iteration (ms): 13600.0 | learning rate: 8.423E-06 | global batch size:    16 | lm loss: 7.082514E+00 | loss scale: 32768.0 | grad norm: 185081.109 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1900/  159576 | consumed samples:        30400 | elapsed time per iteration (ms): 14001.2 | learning rate: 8.428E-06 | global batch size:    16 | lm loss: 7.021253E+00 | loss scale: 32768.0 | grad norm: 220377.192 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1901/  159576 | consumed samples:        30416 | elapsed time per iteration (ms): 13722.2 | learning rate: 8.432E-06 | global batch size:    16 | lm loss: 7.049896E+00 | loss scale: 32768.0 | grad norm: 166889.016 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1902/  159576 | consumed samples:        30432 | elapsed time per iteration (ms): 13621.3 | learning rate: 8.436E-06 | global batch size:    16 | lm loss: 6.878879E+00 | loss scale: 32768.0 | grad norm: 145213.866 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1903/  159576 | consumed samples:        30448 | elapsed time per iteration (ms): 13693.3 | learning rate: 8.441E-06 | global batch size:    16 | lm loss: 6.981446E+00 | loss scale: 32768.0 | grad norm: 385714.234 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1904/  159576 | consumed samples:        30464 | elapsed time per iteration (ms): 13924.8 | learning rate: 8.445E-06 | global batch size:    16 | lm loss: 7.065192E+00 | loss scale: 32768.0 | grad norm: 230309.474 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1905/  159576 | consumed samples:        30480 | elapsed time per iteration (ms): 13762.9 | learning rate: 8.450E-06 | global batch size:    16 | lm loss: 7.016763E+00 | loss scale: 32768.0 | grad norm: 164701.451 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1906/  159576 | consumed samples:        30496 | elapsed time per iteration (ms): 13644.6 | learning rate: 8.454E-06 | global batch size:    16 | lm loss: 6.935023E+00 | loss scale: 32768.0 | grad norm: 158636.532 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1907/  159576 | consumed samples:        30512 | elapsed time per iteration (ms): 13659.2 | learning rate: 8.459E-06 | global batch size:    16 | lm loss: 7.008549E+00 | loss scale: 32768.0 | grad norm: 216415.844 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1908/  159576 | consumed samples:        30528 | elapsed time per iteration (ms): 13777.8 | learning rate: 8.463E-06 | global batch size:    16 | lm loss: 7.210999E+00 | loss scale: 32768.0 | grad norm: 201609.115 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1909/  159576 | consumed samples:        30544 | elapsed time per iteration (ms): 13647.1 | learning rate: 8.467E-06 | global batch size:    16 | lm loss: 7.035434E+00 | loss scale: 32768.0 | grad norm: 157381.108 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1910/  159576 | consumed samples:        30560 | elapsed time per iteration (ms): 13657.7 | learning rate: 8.472E-06 | global batch size:    16 | lm loss: 7.002993E+00 | loss scale: 32768.0 | grad norm: 137094.187 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1911/  159576 | consumed samples:        30576 | elapsed time per iteration (ms): 13538.8 | learning rate: 8.476E-06 | global batch size:    16 | lm loss: 6.895042E+00 | loss scale: 32768.0 | grad norm: 201565.995 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1912/  159576 | consumed samples:        30592 | elapsed time per iteration (ms): 13570.4 | learning rate: 8.481E-06 | global batch size:    16 | lm loss: 7.119932E+00 | loss scale: 32768.0 | grad norm: 191020.294 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1913/  159576 | consumed samples:        30608 | elapsed time per iteration (ms): 13960.8 | learning rate: 8.485E-06 | global batch size:    16 | lm loss: 7.021863E+00 | loss scale: 32768.0 | grad norm: 163947.486 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1914/  159576 | consumed samples:        30624 | elapsed time per iteration (ms): 13571.3 | learning rate: 8.490E-06 | global batch size:    16 | lm loss: 7.255896E+00 | loss scale: 32768.0 | grad norm: 110811.833 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1915/  159576 | consumed samples:        30640 | elapsed time per iteration (ms): 13592.9 | learning rate: 8.494E-06 | global batch size:    16 | lm loss: 7.058972E+00 | loss scale: 32768.0 | grad norm: 226666.177 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1916/  159576 | consumed samples:        30656 | elapsed time per iteration (ms): 13559.3 | learning rate: 8.499E-06 | global batch size:    16 | lm loss: 7.001413E+00 | loss scale: 32768.0 | grad norm: 155562.702 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1917/  159576 | consumed samples:        30672 | elapsed time per iteration (ms): 13603.1 | learning rate: 8.503E-06 | global batch size:    16 | lm loss: 6.925358E+00 | loss scale: 32768.0 | grad norm: 153599.875 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1918/  159576 | consumed samples:        30688 | elapsed time per iteration (ms): 13848.6 | learning rate: 8.507E-06 | global batch size:    16 | lm loss: 7.013722E+00 | loss scale: 32768.0 | grad norm: 151847.788 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1919/  159576 | consumed samples:        30704 | elapsed time per iteration (ms): 13580.7 | learning rate: 8.512E-06 | global batch size:    16 | lm loss: 7.057837E+00 | loss scale: 32768.0 | grad norm: 149268.841 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1920/  159576 | consumed samples:        30720 | elapsed time per iteration (ms): 13579.6 | learning rate: 8.516E-06 | global batch size:    16 | lm loss: 7.059657E+00 | loss scale: 32768.0 | grad norm: 211843.149 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1921/  159576 | consumed samples:        30736 | elapsed time per iteration (ms): 13716.2 | learning rate: 8.521E-06 | global batch size:    16 | lm loss: 7.145122E+00 | loss scale: 32768.0 | grad norm: 158831.213 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1922/  159576 | consumed samples:        30752 | elapsed time per iteration (ms): 14204.8 | learning rate: 8.525E-06 | global batch size:    16 | lm loss: 7.012016E+00 | loss scale: 32768.0 | grad norm: 142219.675 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1923/  159576 | consumed samples:        30768 | elapsed time per iteration (ms): 13586.3 | learning rate: 8.530E-06 | global batch size:    16 | lm loss: 6.958722E+00 | loss scale: 32768.0 | grad norm: 147958.053 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1924/  159576 | consumed samples:        30784 | elapsed time per iteration (ms): 13654.4 | learning rate: 8.534E-06 | global batch size:    16 | lm loss: 6.916204E+00 | loss scale: 32768.0 | grad norm: 168316.999 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1925/  159576 | consumed samples:        30800 | elapsed time per iteration (ms): 13581.4 | learning rate: 8.538E-06 | global batch size:    16 | lm loss: 7.208139E+00 | loss scale: 32768.0 | grad norm: 186895.870 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1926/  159576 | consumed samples:        30816 | elapsed time per iteration (ms): 14057.7 | learning rate: 8.543E-06 | global batch size:    16 | lm loss: 6.921901E+00 | loss scale: 32768.0 | grad norm: 136886.936 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1927/  159576 | consumed samples:        30832 | elapsed time per iteration (ms): 13553.3 | learning rate: 8.547E-06 | global batch size:    16 | lm loss: 7.044703E+00 | loss scale: 32768.0 | grad norm: 318519.845 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1928/  159576 | consumed samples:        30848 | elapsed time per iteration (ms): 13594.1 | learning rate: 8.552E-06 | global batch size:    16 | lm loss: 6.906800E+00 | loss scale: 32768.0 | grad norm: 155021.065 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1929/  159576 | consumed samples:        30864 | elapsed time per iteration (ms): 13607.1 | learning rate: 8.556E-06 | global batch size:    16 | lm loss: 6.881465E+00 | loss scale: 32768.0 | grad norm: 190717.011 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1930/  159576 | consumed samples:        30880 | elapsed time per iteration (ms): 13551.6 | learning rate: 8.561E-06 | global batch size:    16 | lm loss: 7.199529E+00 | loss scale: 32768.0 | grad norm: 191859.870 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1931/  159576 | consumed samples:        30896 | elapsed time per iteration (ms): 13806.2 | learning rate: 8.565E-06 | global batch size:    16 | lm loss: 6.954100E+00 | loss scale: 32768.0 | grad norm: 130775.699 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1932/  159576 | consumed samples:        30912 | elapsed time per iteration (ms): 13613.1 | learning rate: 8.570E-06 | global batch size:    16 | lm loss: 6.704428E+00 | loss scale: 32768.0 | grad norm: 137607.979 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1933/  159576 | consumed samples:        30928 | elapsed time per iteration (ms): 13506.4 | learning rate: 8.574E-06 | global batch size:    16 | lm loss: 7.014212E+00 | loss scale: 32768.0 | grad norm: 186579.256 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1934/  159576 | consumed samples:        30944 | elapsed time per iteration (ms): 13520.6 | learning rate: 8.578E-06 | global batch size:    16 | lm loss: 7.012688E+00 | loss scale: 32768.0 | grad norm: 155464.251 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1935/  159576 | consumed samples:        30960 | elapsed time per iteration (ms): 13855.4 | learning rate: 8.583E-06 | global batch size:    16 | lm loss: 7.011374E+00 | loss scale: 32768.0 | grad norm: 128570.064 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1936/  159576 | consumed samples:        30976 | elapsed time per iteration (ms): 13483.8 | learning rate: 8.587E-06 | global batch size:    16 | lm loss: 6.823971E+00 | loss scale: 32768.0 | grad norm: 185286.139 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1937/  159576 | consumed samples:        30992 | elapsed time per iteration (ms): 13455.5 | learning rate: 8.592E-06 | global batch size:    16 | lm loss: 7.002713E+00 | loss scale: 32768.0 | grad norm: 168834.944 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1938/  159576 | consumed samples:        31008 | elapsed time per iteration (ms): 13488.7 | learning rate: 8.596E-06 | global batch size:    16 | lm loss: 7.308265E+00 | loss scale: 32768.0 | grad norm: 113334.432 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1939/  159576 | consumed samples:        31024 | elapsed time per iteration (ms): 13517.8 | learning rate: 8.601E-06 | global batch size:    16 | lm loss: 6.832065E+00 | loss scale: 32768.0 | grad norm: 143617.951 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1940/  159576 | consumed samples:        31040 | elapsed time per iteration (ms): 13777.8 | learning rate: 8.605E-06 | global batch size:    16 | lm loss: 6.758460E+00 | loss scale: 32768.0 | grad norm: 131000.555 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1941/  159576 | consumed samples:        31056 | elapsed time per iteration (ms): 13526.9 | learning rate: 8.609E-06 | global batch size:    16 | lm loss: 6.587332E+00 | loss scale: 32768.0 | grad norm: 133270.011 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1942/  159576 | consumed samples:        31072 | elapsed time per iteration (ms): 13522.3 | learning rate: 8.614E-06 | global batch size:    16 | lm loss: 7.005889E+00 | loss scale: 32768.0 | grad norm: 169934.736 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1943/  159576 | consumed samples:        31088 | elapsed time per iteration (ms): 13505.7 | learning rate: 8.618E-06 | global batch size:    16 | lm loss: 7.113358E+00 | loss scale: 32768.0 | grad norm: 147469.388 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1944/  159576 | consumed samples:        31104 | elapsed time per iteration (ms): 14004.8 | learning rate: 8.623E-06 | global batch size:    16 | lm loss: 6.815184E+00 | loss scale: 32768.0 | grad norm: 129420.793 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1945/  159576 | consumed samples:        31120 | elapsed time per iteration (ms): 13536.0 | learning rate: 8.627E-06 | global batch size:    16 | lm loss: 6.802580E+00 | loss scale: 32768.0 | grad norm: 206454.023 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1946/  159576 | consumed samples:        31136 | elapsed time per iteration (ms): 13571.2 | learning rate: 8.632E-06 | global batch size:    16 | lm loss: 6.899452E+00 | loss scale: 32768.0 | grad norm: 159625.238 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1947/  159576 | consumed samples:        31152 | elapsed time per iteration (ms): 13512.7 | learning rate: 8.636E-06 | global batch size:    16 | lm loss: 6.902468E+00 | loss scale: 32768.0 | grad norm: 161374.302 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1948/  159576 | consumed samples:        31168 | elapsed time per iteration (ms): 13965.3 | learning rate: 8.641E-06 | global batch size:    16 | lm loss: 7.027518E+00 | loss scale: 32768.0 | grad norm: 141898.251 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1949/  159576 | consumed samples:        31184 | elapsed time per iteration (ms): 13617.6 | learning rate: 8.645E-06 | global batch size:    16 | lm loss: 6.901030E+00 | loss scale: 32768.0 | grad norm: 115156.669 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1950/  159576 | consumed samples:        31200 | elapsed time per iteration (ms): 13549.7 | learning rate: 8.649E-06 | global batch size:    16 | lm loss: 7.012411E+00 | loss scale: 32768.0 | grad norm: 364327.043 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1951/  159576 | consumed samples:        31216 | elapsed time per iteration (ms): 13460.7 | learning rate: 8.654E-06 | global batch size:    16 | lm loss: 6.996010E+00 | loss scale: 32768.0 | grad norm: 265923.298 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1952/  159576 | consumed samples:        31232 | elapsed time per iteration (ms): 13574.9 | learning rate: 8.658E-06 | global batch size:    16 | lm loss: 7.002955E+00 | loss scale: 32768.0 | grad norm: 147080.962 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1953/  159576 | consumed samples:        31248 | elapsed time per iteration (ms): 13782.5 | learning rate: 8.663E-06 | global batch size:    16 | lm loss: 6.930263E+00 | loss scale: 32768.0 | grad norm: 190217.592 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1954/  159576 | consumed samples:        31264 | elapsed time per iteration (ms): 13515.2 | learning rate: 8.667E-06 | global batch size:    16 | lm loss: 6.835277E+00 | loss scale: 32768.0 | grad norm: 254678.528 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1955/  159576 | consumed samples:        31280 | elapsed time per iteration (ms): 13569.3 | learning rate: 8.672E-06 | global batch size:    16 | lm loss: 7.283230E+00 | loss scale: 32768.0 | grad norm: 137167.505 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1956/  159576 | consumed samples:        31296 | elapsed time per iteration (ms): 13592.0 | learning rate: 8.676E-06 | global batch size:    16 | lm loss: 6.895840E+00 | loss scale: 32768.0 | grad norm: 198657.902 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1957/  159576 | consumed samples:        31312 | elapsed time per iteration (ms): 13906.4 | learning rate: 8.680E-06 | global batch size:    16 | lm loss: 7.127283E+00 | loss scale: 32768.0 | grad norm: 242163.922 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1958/  159576 | consumed samples:        31328 | elapsed time per iteration (ms): 13647.9 | learning rate: 8.685E-06 | global batch size:    16 | lm loss: 7.022318E+00 | loss scale: 32768.0 | grad norm: 179227.362 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1959/  159576 | consumed samples:        31344 | elapsed time per iteration (ms): 13668.0 | learning rate: 8.689E-06 | global batch size:    16 | lm loss: 7.021772E+00 | loss scale: 32768.0 | grad norm: 223437.294 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1960/  159576 | consumed samples:        31360 | elapsed time per iteration (ms): 13699.2 | learning rate: 8.694E-06 | global batch size:    16 | lm loss: 7.270517E+00 | loss scale: 32768.0 | grad norm: 166965.849 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1961/  159576 | consumed samples:        31376 | elapsed time per iteration (ms): 13595.5 | learning rate: 8.698E-06 | global batch size:    16 | lm loss: 6.963766E+00 | loss scale: 32768.0 | grad norm: 257581.689 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1962/  159576 | consumed samples:        31392 | elapsed time per iteration (ms): 13818.3 | learning rate: 8.703E-06 | global batch size:    16 | lm loss: 6.847409E+00 | loss scale: 32768.0 | grad norm: 162709.033 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1963/  159576 | consumed samples:        31408 | elapsed time per iteration (ms): 13645.3 | learning rate: 8.707E-06 | global batch size:    16 | lm loss: 6.902783E+00 | loss scale: 32768.0 | grad norm: 186486.366 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1964/  159576 | consumed samples:        31424 | elapsed time per iteration (ms): 13637.0 | learning rate: 8.712E-06 | global batch size:    16 | lm loss: 7.112407E+00 | loss scale: 32768.0 | grad norm: 234566.375 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1965/  159576 | consumed samples:        31440 | elapsed time per iteration (ms): 13632.5 | learning rate: 8.716E-06 | global batch size:    16 | lm loss: 6.965158E+00 | loss scale: 32768.0 | grad norm: 162405.643 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1966/  159576 | consumed samples:        31456 | elapsed time per iteration (ms): 13923.2 | learning rate: 8.720E-06 | global batch size:    16 | lm loss: 7.162685E+00 | loss scale: 32768.0 | grad norm: 160740.607 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1967/  159576 | consumed samples:        31472 | elapsed time per iteration (ms): 13722.5 | learning rate: 8.725E-06 | global batch size:    16 | lm loss: 6.822609E+00 | loss scale: 32768.0 | grad norm: 163162.027 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1968/  159576 | consumed samples:        31488 | elapsed time per iteration (ms): 13559.9 | learning rate: 8.729E-06 | global batch size:    16 | lm loss: 6.829067E+00 | loss scale: 32768.0 | grad norm: 148991.615 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1969/  159576 | consumed samples:        31504 | elapsed time per iteration (ms): 13640.6 | learning rate: 8.734E-06 | global batch size:    16 | lm loss: 6.753247E+00 | loss scale: 32768.0 | grad norm: 174635.290 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1970/  159576 | consumed samples:        31520 | elapsed time per iteration (ms): 13996.0 | learning rate: 8.738E-06 | global batch size:    16 | lm loss: 7.113372E+00 | loss scale: 32768.0 | grad norm: 278150.407 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1971/  159576 | consumed samples:        31536 | elapsed time per iteration (ms): 13669.9 | learning rate: 8.743E-06 | global batch size:    16 | lm loss: 6.872749E+00 | loss scale: 32768.0 | grad norm: 176866.437 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1972/  159576 | consumed samples:        31552 | elapsed time per iteration (ms): 13634.0 | learning rate: 8.747E-06 | global batch size:    16 | lm loss: 6.944706E+00 | loss scale: 32768.0 | grad norm: 145690.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1973/  159576 | consumed samples:        31568 | elapsed time per iteration (ms): 13676.3 | learning rate: 8.751E-06 | global batch size:    16 | lm loss: 7.106283E+00 | loss scale: 32768.0 | grad norm: 154568.562 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1974/  159576 | consumed samples:        31584 | elapsed time per iteration (ms): 13610.0 | learning rate: 8.756E-06 | global batch size:    16 | lm loss: 7.001073E+00 | loss scale: 32768.0 | grad norm: 156908.897 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1975/  159576 | consumed samples:        31600 | elapsed time per iteration (ms): 13727.1 | learning rate: 8.760E-06 | global batch size:    16 | lm loss: 7.050818E+00 | loss scale: 32768.0 | grad norm: 234696.902 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1976/  159576 | consumed samples:        31616 | elapsed time per iteration (ms): 13612.3 | learning rate: 8.765E-06 | global batch size:    16 | lm loss: 7.084875E+00 | loss scale: 32768.0 | grad norm: 169650.883 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1977/  159576 | consumed samples:        31632 | elapsed time per iteration (ms): 13652.4 | learning rate: 8.769E-06 | global batch size:    16 | lm loss: 6.942274E+00 | loss scale: 32768.0 | grad norm: 133422.940 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1978/  159576 | consumed samples:        31648 | elapsed time per iteration (ms): 13598.6 | learning rate: 8.774E-06 | global batch size:    16 | lm loss: 7.020503E+00 | loss scale: 32768.0 | grad norm: 191046.458 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1979/  159576 | consumed samples:        31664 | elapsed time per iteration (ms): 6793.7 | learning rate: 8.774E-06 | global batch size:    16 | lm loss: 7.205068E+00 | loss scale: 16384.0 | grad norm: 191046.458 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1980/  159576 | consumed samples:        31680 | elapsed time per iteration (ms): 13294.9 | learning rate: 8.778E-06 | global batch size:    16 | lm loss: 6.981399E+00 | loss scale: 16384.0 | grad norm: 88750.748 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1981/  159576 | consumed samples:        31696 | elapsed time per iteration (ms): 13611.4 | learning rate: 8.783E-06 | global batch size:    16 | lm loss: 7.062120E+00 | loss scale: 16384.0 | grad norm: 98643.338 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1982/  159576 | consumed samples:        31712 | elapsed time per iteration (ms): 13593.8 | learning rate: 8.787E-06 | global batch size:    16 | lm loss: 6.878181E+00 | loss scale: 16384.0 | grad norm: 67555.616 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1983/  159576 | consumed samples:        31728 | elapsed time per iteration (ms): 13656.6 | learning rate: 8.791E-06 | global batch size:    16 | lm loss: 6.958256E+00 | loss scale: 16384.0 | grad norm: 79163.237 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1984/  159576 | consumed samples:        31744 | elapsed time per iteration (ms): 13863.2 | learning rate: 8.796E-06 | global batch size:    16 | lm loss: 6.850488E+00 | loss scale: 16384.0 | grad norm: 49908.825 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1985/  159576 | consumed samples:        31760 | elapsed time per iteration (ms): 13625.0 | learning rate: 8.800E-06 | global batch size:    16 | lm loss: 7.227520E+00 | loss scale: 16384.0 | grad norm: 56779.919 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1986/  159576 | consumed samples:        31776 | elapsed time per iteration (ms): 13644.4 | learning rate: 8.805E-06 | global batch size:    16 | lm loss: 7.002261E+00 | loss scale: 16384.0 | grad norm: 88929.296 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1987/  159576 | consumed samples:        31792 | elapsed time per iteration (ms): 13690.4 | learning rate: 8.809E-06 | global batch size:    16 | lm loss: 7.085162E+00 | loss scale: 16384.0 | grad norm: 50454.218 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1988/  159576 | consumed samples:        31808 | elapsed time per iteration (ms): 13934.9 | learning rate: 8.814E-06 | global batch size:    16 | lm loss: 6.948382E+00 | loss scale: 16384.0 | grad norm: 95360.624 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1989/  159576 | consumed samples:        31824 | elapsed time per iteration (ms): 13779.2 | learning rate: 8.818E-06 | global batch size:    16 | lm loss: 6.810514E+00 | loss scale: 16384.0 | grad norm: 64656.236 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1990/  159576 | consumed samples:        31840 | elapsed time per iteration (ms): 13639.8 | learning rate: 8.822E-06 | global batch size:    16 | lm loss: 6.904098E+00 | loss scale: 16384.0 | grad norm: 77126.795 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1991/  159576 | consumed samples:        31856 | elapsed time per iteration (ms): 13559.7 | learning rate: 8.827E-06 | global batch size:    16 | lm loss: 6.833849E+00 | loss scale: 16384.0 | grad norm: 68875.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1992/  159576 | consumed samples:        31872 | elapsed time per iteration (ms): 13602.8 | learning rate: 8.831E-06 | global batch size:    16 | lm loss: 6.989305E+00 | loss scale: 16384.0 | grad norm: 77647.510 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1993/  159576 | consumed samples:        31888 | elapsed time per iteration (ms): 13976.7 | learning rate: 8.836E-06 | global batch size:    16 | lm loss: 6.928751E+00 | loss scale: 16384.0 | grad norm: 67757.778 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1994/  159576 | consumed samples:        31904 | elapsed time per iteration (ms): 13704.1 | learning rate: 8.840E-06 | global batch size:    16 | lm loss: 6.835466E+00 | loss scale: 16384.0 | grad norm: 69187.044 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1995/  159576 | consumed samples:        31920 | elapsed time per iteration (ms): 13650.9 | learning rate: 8.845E-06 | global batch size:    16 | lm loss: 7.294861E+00 | loss scale: 16384.0 | grad norm: 143539.847 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1996/  159576 | consumed samples:        31936 | elapsed time per iteration (ms): 13627.5 | learning rate: 8.849E-06 | global batch size:    16 | lm loss: 7.121392E+00 | loss scale: 16384.0 | grad norm: 74325.382 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1997/  159576 | consumed samples:        31952 | elapsed time per iteration (ms): 13965.6 | learning rate: 8.854E-06 | global batch size:    16 | lm loss: 7.100244E+00 | loss scale: 16384.0 | grad norm: 92130.836 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1998/  159576 | consumed samples:        31968 | elapsed time per iteration (ms): 13575.4 | learning rate: 8.858E-06 | global batch size:    16 | lm loss: 7.115794E+00 | loss scale: 16384.0 | grad norm: 85348.558 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1999/  159576 | consumed samples:        31984 | elapsed time per iteration (ms): 13570.5 | learning rate: 8.862E-06 | global batch size:    16 | lm loss: 6.867460E+00 | loss scale: 16384.0 | grad norm: 112217.552 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-24 09:55:59,653] [INFO] [logging.py:68:log_dist] [Rank 0] step=2000, skipped=2, lr=[8.866863905325443e-06, 8.866863905325443e-06], mom=[(0.9, 0.999), (0.9, 0.999)]
steps: 2000 loss: 6.9298 iter time (s): 0.007 samples/sec: 2456.816
 iteration     2000/  159576 | consumed samples:        32000 | elapsed time per iteration (ms): 13633.7 | learning rate: 8.867E-06 | global batch size:    16 | lm loss: 6.929829E+00 | loss scale: 16384.0 | grad norm: 73846.467 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
------------------------------------------------------------------------------------------------
 validation loss at iteration 2000 | lm loss value: 7.084489E+00 | lm loss PPL: 1.193313E+03 | 
------------------------------------------------------------------------------------------------
 iteration     2001/  159576 | consumed samples:        32016 | elapsed time per iteration (ms): 18999.4 | learning rate: 8.871E-06 | global batch size:    16 | lm loss: 6.882600E+00 | loss scale: 16384.0 | grad norm: 132358.129 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2002/  159576 | consumed samples:        32032 | elapsed time per iteration (ms): 13626.5 | learning rate: 8.876E-06 | global batch size:    16 | lm loss: 7.231313E+00 | loss scale: 16384.0 | grad norm: 139453.166 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2003/  159576 | consumed samples:        32048 | elapsed time per iteration (ms): 13687.4 | learning rate: 8.880E-06 | global batch size:    16 | lm loss: 7.034769E+00 | loss scale: 16384.0 | grad norm: 74117.400 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2004/  159576 | consumed samples:        32064 | elapsed time per iteration (ms): 13579.3 | learning rate: 8.885E-06 | global batch size:    16 | lm loss: 7.053939E+00 | loss scale: 16384.0 | grad norm: 185455.314 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2005/  159576 | consumed samples:        32080 | elapsed time per iteration (ms): 13617.6 | learning rate: 8.889E-06 | global batch size:    16 | lm loss: 6.871277E+00 | loss scale: 16384.0 | grad norm: 117343.684 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2006/  159576 | consumed samples:        32096 | elapsed time per iteration (ms): 13892.7 | learning rate: 8.893E-06 | global batch size:    16 | lm loss: 6.839181E+00 | loss scale: 16384.0 | grad norm: 77619.124 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2007/  159576 | consumed samples:        32112 | elapsed time per iteration (ms): 13580.2 | learning rate: 8.898E-06 | global batch size:    16 | lm loss: 7.031313E+00 | loss scale: 16384.0 | grad norm: 111506.485 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2008/  159576 | consumed samples:        32128 | elapsed time per iteration (ms): 13652.0 | learning rate: 8.902E-06 | global batch size:    16 | lm loss: 6.763354E+00 | loss scale: 16384.0 | grad norm: 74284.647 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2009/  159576 | consumed samples:        32144 | elapsed time per iteration (ms): 13663.9 | learning rate: 8.907E-06 | global batch size:    16 | lm loss: 7.173141E+00 | loss scale: 16384.0 | grad norm: 176920.841 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2010/  159576 | consumed samples:        32160 | elapsed time per iteration (ms): 14071.2 | learning rate: 8.911E-06 | global batch size:    16 | lm loss: 6.940368E+00 | loss scale: 16384.0 | grad norm: 136609.771 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2011/  159576 | consumed samples:        32176 | elapsed time per iteration (ms): 13641.6 | learning rate: 8.916E-06 | global batch size:    16 | lm loss: 7.348205E+00 | loss scale: 16384.0 | grad norm: 74685.726 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2012/  159576 | consumed samples:        32192 | elapsed time per iteration (ms): 13599.3 | learning rate: 8.920E-06 | global batch size:    16 | lm loss: 6.813260E+00 | loss scale: 16384.0 | grad norm: 98269.295 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2013/  159576 | consumed samples:        32208 | elapsed time per iteration (ms): 13658.0 | learning rate: 8.925E-06 | global batch size:    16 | lm loss: 7.088203E+00 | loss scale: 16384.0 | grad norm: 67591.274 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2014/  159576 | consumed samples:        32224 | elapsed time per iteration (ms): 14073.3 | learning rate: 8.929E-06 | global batch size:    16 | lm loss: 6.925144E+00 | loss scale: 16384.0 | grad norm: 125518.891 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2015/  159576 | consumed samples:        32240 | elapsed time per iteration (ms): 13531.4 | learning rate: 8.933E-06 | global batch size:    16 | lm loss: 7.150875E+00 | loss scale: 16384.0 | grad norm: 145833.664 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2016/  159576 | consumed samples:        32256 | elapsed time per iteration (ms): 13718.9 | learning rate: 8.938E-06 | global batch size:    16 | lm loss: 7.058916E+00 | loss scale: 16384.0 | grad norm: 104576.621 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2017/  159576 | consumed samples:        32272 | elapsed time per iteration (ms): 13660.3 | learning rate: 8.942E-06 | global batch size:    16 | lm loss: 7.075126E+00 | loss scale: 16384.0 | grad norm: 68969.823 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2018/  159576 | consumed samples:        32288 | elapsed time per iteration (ms): 13657.9 | learning rate: 8.947E-06 | global batch size:    16 | lm loss: 7.021468E+00 | loss scale: 16384.0 | grad norm: 102873.081 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2019/  159576 | consumed samples:        32304 | elapsed time per iteration (ms): 13864.5 | learning rate: 8.951E-06 | global batch size:    16 | lm loss: 7.182456E+00 | loss scale: 16384.0 | grad norm: 83098.867 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2020/  159576 | consumed samples:        32320 | elapsed time per iteration (ms): 13595.8 | learning rate: 8.956E-06 | global batch size:    16 | lm loss: 7.201014E+00 | loss scale: 16384.0 | grad norm: 86577.891 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2021/  159576 | consumed samples:        32336 | elapsed time per iteration (ms): 13656.2 | learning rate: 8.960E-06 | global batch size:    16 | lm loss: 7.021406E+00 | loss scale: 16384.0 | grad norm: 81681.230 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2022/  159576 | consumed samples:        32352 | elapsed time per iteration (ms): 13573.2 | learning rate: 8.964E-06 | global batch size:    16 | lm loss: 7.084285E+00 | loss scale: 16384.0 | grad norm: 87860.375 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2023/  159576 | consumed samples:        32368 | elapsed time per iteration (ms): 13983.6 | learning rate: 8.969E-06 | global batch size:    16 | lm loss: 6.934657E+00 | loss scale: 16384.0 | grad norm: 59691.509 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2024/  159576 | consumed samples:        32384 | elapsed time per iteration (ms): 13601.4 | learning rate: 8.973E-06 | global batch size:    16 | lm loss: 7.007637E+00 | loss scale: 16384.0 | grad norm: 90222.500 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2025/  159576 | consumed samples:        32400 | elapsed time per iteration (ms): 13711.5 | learning rate: 8.978E-06 | global batch size:    16 | lm loss: 6.979746E+00 | loss scale: 16384.0 | grad norm: 93849.629 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2026/  159576 | consumed samples:        32416 | elapsed time per iteration (ms): 13699.6 | learning rate: 8.982E-06 | global batch size:    16 | lm loss: 6.934021E+00 | loss scale: 16384.0 | grad norm: 80041.099 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2027/  159576 | consumed samples:        32432 | elapsed time per iteration (ms): 14076.1 | learning rate: 8.987E-06 | global batch size:    16 | lm loss: 6.980267E+00 | loss scale: 16384.0 | grad norm: 62895.732 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2028/  159576 | consumed samples:        32448 | elapsed time per iteration (ms): 13679.2 | learning rate: 8.991E-06 | global batch size:    16 | lm loss: 7.024888E+00 | loss scale: 16384.0 | grad norm: 52171.920 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2029/  159576 | consumed samples:        32464 | elapsed time per iteration (ms): 13587.5 | learning rate: 8.996E-06 | global batch size:    16 | lm loss: 7.115479E+00 | loss scale: 16384.0 | grad norm: 102889.917 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2030/  159576 | consumed samples:        32480 | elapsed time per iteration (ms): 13601.6 | learning rate: 9.000E-06 | global batch size:    16 | lm loss: 7.058015E+00 | loss scale: 16384.0 | grad norm: 59629.338 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2031/  159576 | consumed samples:        32496 | elapsed time per iteration (ms): 13586.5 | learning rate: 9.004E-06 | global batch size:    16 | lm loss: 7.114190E+00 | loss scale: 16384.0 | grad norm: 71212.111 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2032/  159576 | consumed samples:        32512 | elapsed time per iteration (ms): 13640.1 | learning rate: 9.009E-06 | global batch size:    16 | lm loss: 7.060964E+00 | loss scale: 16384.0 | grad norm: 64723.435 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2033/  159576 | consumed samples:        32528 | elapsed time per iteration (ms): 13600.9 | learning rate: 9.013E-06 | global batch size:    16 | lm loss: 7.134828E+00 | loss scale: 16384.0 | grad norm: 56762.338 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2034/  159576 | consumed samples:        32544 | elapsed time per iteration (ms): 13742.8 | learning rate: 9.018E-06 | global batch size:    16 | lm loss: 7.147020E+00 | loss scale: 16384.0 | grad norm: 116614.867 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2035/  159576 | consumed samples:        32560 | elapsed time per iteration (ms): 13462.2 | learning rate: 9.022E-06 | global batch size:    16 | lm loss: 7.059257E+00 | loss scale: 16384.0 | grad norm: 95862.140 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2036/  159576 | consumed samples:        32576 | elapsed time per iteration (ms): 13758.4 | learning rate: 9.027E-06 | global batch size:    16 | lm loss: 7.181802E+00 | loss scale: 16384.0 | grad norm: 60688.861 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2037/  159576 | consumed samples:        32592 | elapsed time per iteration (ms): 13746.2 | learning rate: 9.031E-06 | global batch size:    16 | lm loss: 6.934449E+00 | loss scale: 16384.0 | grad norm: 71317.463 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2038/  159576 | consumed samples:        32608 | elapsed time per iteration (ms): 13575.3 | learning rate: 9.036E-06 | global batch size:    16 | lm loss: 7.080134E+00 | loss scale: 16384.0 | grad norm: 57102.289 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2039/  159576 | consumed samples:        32624 | elapsed time per iteration (ms): 13585.4 | learning rate: 9.040E-06 | global batch size:    16 | lm loss: 6.727631E+00 | loss scale: 16384.0 | grad norm: 74231.007 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2040/  159576 | consumed samples:        32640 | elapsed time per iteration (ms): 13615.9 | learning rate: 9.044E-06 | global batch size:    16 | lm loss: 6.939786E+00 | loss scale: 16384.0 | grad norm: 75764.732 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
WARNING:apscheduler.scheduler:Execution of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-24 10:05:20 CEST)" skipped: maximum number of running instances reached (3)
 iteration     2041/  159576 | consumed samples:        32656 | elapsed time per iteration (ms): 14017.0 | learning rate: 9.049E-06 | global batch size:    16 | lm loss: 6.912762E+00 | loss scale: 16384.0 | grad norm: 60359.456 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2042/  159576 | consumed samples:        32672 | elapsed time per iteration (ms): 13386.8 | learning rate: 9.053E-06 | global batch size:    16 | lm loss: 6.892349E+00 | loss scale: 16384.0 | grad norm: 68369.533 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-24 10:05:52] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1165978_[1-10%1] on 'gpu_p13' partition)
[2021-09-24 10:05:52] PULSE: tr8-104B is running for 4:13:41 since 2021-09-24T05:52:11 (1162855_1 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8])
 iteration     2043/  159576 | consumed samples:        32688 | elapsed time per iteration (ms): 13496.3 | learning rate: 9.058E-06 | global batch size:    16 | lm loss: 7.106496E+00 | loss scale: 16384.0 | grad norm: 74847.038 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2044/  159576 | consumed samples:        32704 | elapsed time per iteration (ms): 13461.5 | learning rate: 9.062E-06 | global batch size:    16 | lm loss: 7.101841E+00 | loss scale: 16384.0 | grad norm: 81326.664 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2045/  159576 | consumed samples:        32720 | elapsed time per iteration (ms): 14029.5 | learning rate: 9.067E-06 | global batch size:    16 | lm loss: 6.818883E+00 | loss scale: 16384.0 | grad norm: 55780.102 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2046/  159576 | consumed samples:        32736 | elapsed time per iteration (ms): 13528.3 | learning rate: 9.071E-06 | global batch size:    16 | lm loss: 7.344654E+00 | loss scale: 16384.0 | grad norm: 85807.867 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2047/  159576 | consumed samples:        32752 | elapsed time per iteration (ms): 13633.2 | learning rate: 9.075E-06 | global batch size:    16 | lm loss: 7.041794E+00 | loss scale: 16384.0 | grad norm: 68040.665 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2048/  159576 | consumed samples:        32768 | elapsed time per iteration (ms): 13714.3 | learning rate: 9.080E-06 | global batch size:    16 | lm loss: 7.051764E+00 | loss scale: 16384.0 | grad norm: 54860.412 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2049/  159576 | consumed samples:        32784 | elapsed time per iteration (ms): 13991.3 | learning rate: 9.084E-06 | global batch size:    16 | lm loss: 6.824497E+00 | loss scale: 16384.0 | grad norm: 71323.543 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2050/  159576 | consumed samples:        32800 | elapsed time per iteration (ms): 13606.5 | learning rate: 9.089E-06 | global batch size:    16 | lm loss: 7.182322E+00 | loss scale: 16384.0 | grad norm: 85719.647 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2051/  159576 | consumed samples:        32816 | elapsed time per iteration (ms): 13580.8 | learning rate: 9.093E-06 | global batch size:    16 | lm loss: 7.293634E+00 | loss scale: 16384.0 | grad norm: 80588.421 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2052/  159576 | consumed samples:        32832 | elapsed time per iteration (ms): 13550.0 | learning rate: 9.098E-06 | global batch size:    16 | lm loss: 7.101615E+00 | loss scale: 16384.0 | grad norm: 84442.652 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2053/  159576 | consumed samples:        32848 | elapsed time per iteration (ms): 13599.2 | learning rate: 9.102E-06 | global batch size:    16 | lm loss: 7.037670E+00 | loss scale: 16384.0 | grad norm: 66660.579 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2054/  159576 | consumed samples:        32864 | elapsed time per iteration (ms): 13845.0 | learning rate: 9.107E-06 | global batch size:    16 | lm loss: 7.019003E+00 | loss scale: 16384.0 | grad norm: 62001.885 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2055/  159576 | consumed samples:        32880 | elapsed time per iteration (ms): 13669.5 | learning rate: 9.111E-06 | global batch size:    16 | lm loss: 6.911786E+00 | loss scale: 16384.0 | grad norm: 117097.154 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2056/  159576 | consumed samples:        32896 | elapsed time per iteration (ms): 13595.0 | learning rate: 9.115E-06 | global batch size:    16 | lm loss: 7.090348E+00 | loss scale: 16384.0 | grad norm: 84113.874 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2057/  159576 | consumed samples:        32912 | elapsed time per iteration (ms): 13602.9 | learning rate: 9.120E-06 | global batch size:    16 | lm loss: 6.805397E+00 | loss scale: 16384.0 | grad norm: 74285.496 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2058/  159576 | consumed samples:        32928 | elapsed time per iteration (ms): 13938.5 | learning rate: 9.124E-06 | global batch size:    16 | lm loss: 7.156925E+00 | loss scale: 16384.0 | grad norm: 123564.876 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2059/  159576 | consumed samples:        32944 | elapsed time per iteration (ms): 13535.6 | learning rate: 9.129E-06 | global batch size:    16 | lm loss: 7.097910E+00 | loss scale: 16384.0 | grad norm: 80614.365 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2060/  159576 | consumed samples:        32960 | elapsed time per iteration (ms): 13561.1 | learning rate: 9.133E-06 | global batch size:    16 | lm loss: 7.173540E+00 | loss scale: 16384.0 | grad norm: 82969.227 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2061/  159576 | consumed samples:        32976 | elapsed time per iteration (ms): 13641.0 | learning rate: 9.138E-06 | global batch size:    16 | lm loss: 6.963642E+00 | loss scale: 16384.0 | grad norm: 58968.599 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2062/  159576 | consumed samples:        32992 | elapsed time per iteration (ms): 13737.9 | learning rate: 9.142E-06 | global batch size:    16 | lm loss: 6.932078E+00 | loss scale: 16384.0 | grad norm: 176037.023 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2063/  159576 | consumed samples:        33008 | elapsed time per iteration (ms): 13779.6 | learning rate: 9.146E-06 | global batch size:    16 | lm loss: 6.904696E+00 | loss scale: 16384.0 | grad norm: 107303.418 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2064/  159576 | consumed samples:        33024 | elapsed time per iteration (ms): 13634.2 | learning rate: 9.151E-06 | global batch size:    16 | lm loss: 6.834531E+00 | loss scale: 16384.0 | grad norm: 100378.838 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2065/  159576 | consumed samples:        33040 | elapsed time per iteration (ms): 13654.1 | learning rate: 9.155E-06 | global batch size:    16 | lm loss: 7.101809E+00 | loss scale: 16384.0 | grad norm: 100637.535 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2066/  159576 | consumed samples:        33056 | elapsed time per iteration (ms): 13496.2 | learning rate: 9.160E-06 | global batch size:    16 | lm loss: 6.822946E+00 | loss scale: 16384.0 | grad norm: 72463.326 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2067/  159576 | consumed samples:        33072 | elapsed time per iteration (ms): 14117.2 | learning rate: 9.164E-06 | global batch size:    16 | lm loss: 7.133995E+00 | loss scale: 16384.0 | grad norm: 265928.980 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2068/  159576 | consumed samples:        33088 | elapsed time per iteration (ms): 13658.0 | learning rate: 9.169E-06 | global batch size:    16 | lm loss: 7.058832E+00 | loss scale: 16384.0 | grad norm: 225451.637 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2069/  159576 | consumed samples:        33104 | elapsed time per iteration (ms): 13647.8 | learning rate: 9.173E-06 | global batch size:    16 | lm loss: 6.733691E+00 | loss scale: 16384.0 | grad norm: 109352.478 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2070/  159576 | consumed samples:        33120 | elapsed time per iteration (ms): 13662.1 | learning rate: 9.178E-06 | global batch size:    16 | lm loss: 7.330385E+00 | loss scale: 16384.0 | grad norm: 106190.502 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2071/  159576 | consumed samples:        33136 | elapsed time per iteration (ms): 14047.9 | learning rate: 9.182E-06 | global batch size:    16 | lm loss: 6.902629E+00 | loss scale: 16384.0 | grad norm: 105263.547 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2072/  159576 | consumed samples:        33152 | elapsed time per iteration (ms): 13604.8 | learning rate: 9.186E-06 | global batch size:    16 | lm loss: 7.059223E+00 | loss scale: 16384.0 | grad norm: 156071.065 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2073/  159576 | consumed samples:        33168 | elapsed time per iteration (ms): 13509.3 | learning rate: 9.191E-06 | global batch size:    16 | lm loss: 6.858756E+00 | loss scale: 16384.0 | grad norm: 183069.137 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2074/  159576 | consumed samples:        33184 | elapsed time per iteration (ms): 13577.0 | learning rate: 9.195E-06 | global batch size:    16 | lm loss: 7.137619E+00 | loss scale: 16384.0 | grad norm: 165868.654 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2075/  159576 | consumed samples:        33200 | elapsed time per iteration (ms): 13598.1 | learning rate: 9.200E-06 | global batch size:    16 | lm loss: 7.105383E+00 | loss scale: 16384.0 | grad norm: 81641.263 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2076/  159576 | consumed samples:        33216 | elapsed time per iteration (ms): 13844.7 | learning rate: 9.204E-06 | global batch size:    16 | lm loss: 6.954556E+00 | loss scale: 16384.0 | grad norm: 90347.722 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2077/  159576 | consumed samples:        33232 | elapsed time per iteration (ms): 13642.3 | learning rate: 9.209E-06 | global batch size:    16 | lm loss: 6.986308E+00 | loss scale: 16384.0 | grad norm: 71161.614 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2078/  159576 | consumed samples:        33248 | elapsed time per iteration (ms): 13714.7 | learning rate: 9.213E-06 | global batch size:    16 | lm loss: 7.186345E+00 | loss scale: 16384.0 | grad norm: 125006.131 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2079/  159576 | consumed samples:        33264 | elapsed time per iteration (ms): 13724.6 | learning rate: 9.217E-06 | global batch size:    16 | lm loss: 7.046529E+00 | loss scale: 16384.0 | grad norm: 72474.668 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2080/  159576 | consumed samples:        33280 | elapsed time per iteration (ms): 13823.6 | learning rate: 9.222E-06 | global batch size:    16 | lm loss: 6.926587E+00 | loss scale: 16384.0 | grad norm: 72628.016 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2081/  159576 | consumed samples:        33296 | elapsed time per iteration (ms): 13659.2 | learning rate: 9.226E-06 | global batch size:    16 | lm loss: 6.850713E+00 | loss scale: 16384.0 | grad norm: 78040.610 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2082/  159576 | consumed samples:        33312 | elapsed time per iteration (ms): 13653.7 | learning rate: 9.231E-06 | global batch size:    16 | lm loss: 7.014567E+00 | loss scale: 16384.0 | grad norm: 88063.955 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2083/  159576 | consumed samples:        33328 | elapsed time per iteration (ms): 13690.1 | learning rate: 9.235E-06 | global batch size:    16 | lm loss: 6.964838E+00 | loss scale: 16384.0 | grad norm: 68577.460 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2084/  159576 | consumed samples:        33344 | elapsed time per iteration (ms): 14064.9 | learning rate: 9.240E-06 | global batch size:    16 | lm loss: 6.954602E+00 | loss scale: 16384.0 | grad norm: 70285.947 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2085/  159576 | consumed samples:        33360 | elapsed time per iteration (ms): 13835.0 | learning rate: 9.244E-06 | global batch size:    16 | lm loss: 6.952052E+00 | loss scale: 16384.0 | grad norm: 85673.500 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2086/  159576 | consumed samples:        33376 | elapsed time per iteration (ms): 13813.8 | learning rate: 9.249E-06 | global batch size:    16 | lm loss: 6.909387E+00 | loss scale: 16384.0 | grad norm: 118966.507 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2087/  159576 | consumed samples:        33392 | elapsed time per iteration (ms): 13678.6 | learning rate: 9.253E-06 | global batch size:    16 | lm loss: 6.961540E+00 | loss scale: 16384.0 | grad norm: 66329.626 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2088/  159576 | consumed samples:        33408 | elapsed time per iteration (ms): 13699.4 | learning rate: 9.257E-06 | global batch size:    16 | lm loss: 7.038545E+00 | loss scale: 16384.0 | grad norm: 77147.534 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2089/  159576 | consumed samples:        33424 | elapsed time per iteration (ms): 13870.3 | learning rate: 9.262E-06 | global batch size:    16 | lm loss: 6.829208E+00 | loss scale: 16384.0 | grad norm: 66850.604 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2090/  159576 | consumed samples:        33440 | elapsed time per iteration (ms): 13553.2 | learning rate: 9.266E-06 | global batch size:    16 | lm loss: 6.885040E+00 | loss scale: 16384.0 | grad norm: 63418.965 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2091/  159576 | consumed samples:        33456 | elapsed time per iteration (ms): 13563.4 | learning rate: 9.271E-06 | global batch size:    16 | lm loss: 7.227287E+00 | loss scale: 16384.0 | grad norm: 99229.607 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2092/  159576 | consumed samples:        33472 | elapsed time per iteration (ms): 13616.1 | learning rate: 9.275E-06 | global batch size:    16 | lm loss: 7.151490E+00 | loss scale: 16384.0 | grad norm: 77793.238 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2093/  159576 | consumed samples:        33488 | elapsed time per iteration (ms): 14020.5 | learning rate: 9.280E-06 | global batch size:    16 | lm loss: 6.956719E+00 | loss scale: 16384.0 | grad norm: 71078.394 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2094/  159576 | consumed samples:        33504 | elapsed time per iteration (ms): 13583.2 | learning rate: 9.284E-06 | global batch size:    16 | lm loss: 6.863022E+00 | loss scale: 16384.0 | grad norm: 75874.396 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2095/  159576 | consumed samples:        33520 | elapsed time per iteration (ms): 13540.7 | learning rate: 9.288E-06 | global batch size:    16 | lm loss: 7.230942E+00 | loss scale: 16384.0 | grad norm: 66376.740 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2096/  159576 | consumed samples:        33536 | elapsed time per iteration (ms): 13617.6 | learning rate: 9.293E-06 | global batch size:    16 | lm loss: 6.938297E+00 | loss scale: 16384.0 | grad norm: 80597.438 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2097/  159576 | consumed samples:        33552 | elapsed time per iteration (ms): 13611.2 | learning rate: 9.297E-06 | global batch size:    16 | lm loss: 6.750860E+00 | loss scale: 16384.0 | grad norm: 50768.159 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2098/  159576 | consumed samples:        33568 | elapsed time per iteration (ms): 13781.0 | learning rate: 9.302E-06 | global batch size:    16 | lm loss: 6.866726E+00 | loss scale: 16384.0 | grad norm: 120258.979 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2099/  159576 | consumed samples:        33584 | elapsed time per iteration (ms): 13657.4 | learning rate: 9.306E-06 | global batch size:    16 | lm loss: 6.825637E+00 | loss scale: 16384.0 | grad norm: 95301.455 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2100/  159576 | consumed samples:        33600 | elapsed time per iteration (ms): 13666.9 | learning rate: 9.311E-06 | global batch size:    16 | lm loss: 6.864701E+00 | loss scale: 16384.0 | grad norm: 68908.392 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2101/  159576 | consumed samples:        33616 | elapsed time per iteration (ms): 13629.3 | learning rate: 9.315E-06 | global batch size:    16 | lm loss: 6.992301E+00 | loss scale: 16384.0 | grad norm: 74768.073 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2102/  159576 | consumed samples:        33632 | elapsed time per iteration (ms): 14067.7 | learning rate: 9.320E-06 | global batch size:    16 | lm loss: 7.044778E+00 | loss scale: 16384.0 | grad norm: 118054.957 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2103/  159576 | consumed samples:        33648 | elapsed time per iteration (ms): 13615.1 | learning rate: 9.324E-06 | global batch size:    16 | lm loss: 7.033617E+00 | loss scale: 16384.0 | grad norm: 69826.634 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2104/  159576 | consumed samples:        33664 | elapsed time per iteration (ms): 13577.5 | learning rate: 9.328E-06 | global batch size:    16 | lm loss: 6.970243E+00 | loss scale: 16384.0 | grad norm: 88873.689 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2105/  159576 | consumed samples:        33680 | elapsed time per iteration (ms): 13581.9 | learning rate: 9.333E-06 | global batch size:    16 | lm loss: 6.917067E+00 | loss scale: 16384.0 | grad norm: 93657.084 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2106/  159576 | consumed samples:        33696 | elapsed time per iteration (ms): 14007.1 | learning rate: 9.337E-06 | global batch size:    16 | lm loss: 7.027580E+00 | loss scale: 16384.0 | grad norm: 62511.740 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2107/  159576 | consumed samples:        33712 | elapsed time per iteration (ms): 13598.0 | learning rate: 9.342E-06 | global batch size:    16 | lm loss: 7.132909E+00 | loss scale: 16384.0 | grad norm: 177960.752 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2108/  159576 | consumed samples:        33728 | elapsed time per iteration (ms): 13635.0 | learning rate: 9.346E-06 | global batch size:    16 | lm loss: 7.048873E+00 | loss scale: 16384.0 | grad norm: 122116.263 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2109/  159576 | consumed samples:        33744 | elapsed time per iteration (ms): 13663.3 | learning rate: 9.351E-06 | global batch size:    16 | lm loss: 6.996678E+00 | loss scale: 16384.0 | grad norm: 85763.068 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2110/  159576 | consumed samples:        33760 | elapsed time per iteration (ms): 13680.8 | learning rate: 9.355E-06 | global batch size:    16 | lm loss: 6.889836E+00 | loss scale: 16384.0 | grad norm: 84089.334 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2111/  159576 | consumed samples:        33776 | elapsed time per iteration (ms): 13628.5 | learning rate: 9.359E-06 | global batch size:    16 | lm loss: 6.968468E+00 | loss scale: 16384.0 | grad norm: 51256.696 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2112/  159576 | consumed samples:        33792 | elapsed time per iteration (ms): 13610.9 | learning rate: 9.364E-06 | global batch size:    16 | lm loss: 6.917239E+00 | loss scale: 16384.0 | grad norm: 126008.694 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2113/  159576 | consumed samples:        33808 | elapsed time per iteration (ms): 13593.1 | learning rate: 9.368E-06 | global batch size:    16 | lm loss: 6.871556E+00 | loss scale: 16384.0 | grad norm: 67758.611 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2114/  159576 | consumed samples:        33824 | elapsed time per iteration (ms): 13663.1 | learning rate: 9.373E-06 | global batch size:    16 | lm loss: 6.927833E+00 | loss scale: 16384.0 | grad norm: 85851.310 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2115/  159576 | consumed samples:        33840 | elapsed time per iteration (ms): 13986.1 | learning rate: 9.377E-06 | global batch size:    16 | lm loss: 6.965062E+00 | loss scale: 16384.0 | grad norm: 65169.232 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2116/  159576 | consumed samples:        33856 | elapsed time per iteration (ms): 13585.2 | learning rate: 9.382E-06 | global batch size:    16 | lm loss: 7.081017E+00 | loss scale: 16384.0 | grad norm: 73782.925 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2117/  159576 | consumed samples:        33872 | elapsed time per iteration (ms): 13717.9 | learning rate: 9.386E-06 | global batch size:    16 | lm loss: 7.005242E+00 | loss scale: 16384.0 | grad norm: 125037.412 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2118/  159576 | consumed samples:        33888 | elapsed time per iteration (ms): 13567.3 | learning rate: 9.391E-06 | global batch size:    16 | lm loss: 6.785961E+00 | loss scale: 16384.0 | grad norm: 74382.903 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2119/  159576 | consumed samples:        33904 | elapsed time per iteration (ms): 13839.4 | learning rate: 9.395E-06 | global batch size:    16 | lm loss: 7.037541E+00 | loss scale: 16384.0 | grad norm: 61070.302 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2120/  159576 | consumed samples:        33920 | elapsed time per iteration (ms): 13840.1 | learning rate: 9.399E-06 | global batch size:    16 | lm loss: 6.688106E+00 | loss scale: 16384.0 | grad norm: 77514.323 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2121/  159576 | consumed samples:        33936 | elapsed time per iteration (ms): 13591.3 | learning rate: 9.404E-06 | global batch size:    16 | lm loss: 6.965182E+00 | loss scale: 16384.0 | grad norm: 85559.683 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2122/  159576 | consumed samples:        33952 | elapsed time per iteration (ms): 13658.1 | learning rate: 9.408E-06 | global batch size:    16 | lm loss: 6.891047E+00 | loss scale: 16384.0 | grad norm: 84454.855 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2123/  159576 | consumed samples:        33968 | elapsed time per iteration (ms): 13650.8 | learning rate: 9.413E-06 | global batch size:    16 | lm loss: 6.784370E+00 | loss scale: 16384.0 | grad norm: 74803.193 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2124/  159576 | consumed samples:        33984 | elapsed time per iteration (ms): 13935.2 | learning rate: 9.417E-06 | global batch size:    16 | lm loss: 6.885671E+00 | loss scale: 16384.0 | grad norm: 68340.117 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2125/  159576 | consumed samples:        34000 | elapsed time per iteration (ms): 13650.4 | learning rate: 9.422E-06 | global batch size:    16 | lm loss: 7.116186E+00 | loss scale: 16384.0 | grad norm: 75719.601 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2126/  159576 | consumed samples:        34016 | elapsed time per iteration (ms): 13617.2 | learning rate: 9.426E-06 | global batch size:    16 | lm loss: 6.759393E+00 | loss scale: 16384.0 | grad norm: 57051.263 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2127/  159576 | consumed samples:        34032 | elapsed time per iteration (ms): 13606.4 | learning rate: 9.430E-06 | global batch size:    16 | lm loss: 6.895882E+00 | loss scale: 16384.0 | grad norm: 117422.540 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2128/  159576 | consumed samples:        34048 | elapsed time per iteration (ms): 13879.5 | learning rate: 9.435E-06 | global batch size:    16 | lm loss: 6.990780E+00 | loss scale: 16384.0 | grad norm: 47327.196 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2129/  159576 | consumed samples:        34064 | elapsed time per iteration (ms): 13685.2 | learning rate: 9.439E-06 | global batch size:    16 | lm loss: 6.883922E+00 | loss scale: 16384.0 | grad norm: 75631.645 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2130/  159576 | consumed samples:        34080 | elapsed time per iteration (ms): 13677.5 | learning rate: 9.444E-06 | global batch size:    16 | lm loss: 6.880146E+00 | loss scale: 16384.0 | grad norm: 70634.211 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2131/  159576 | consumed samples:        34096 | elapsed time per iteration (ms): 13735.8 | learning rate: 9.448E-06 | global batch size:    16 | lm loss: 6.800762E+00 | loss scale: 16384.0 | grad norm: 114482.498 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2132/  159576 | consumed samples:        34112 | elapsed time per iteration (ms): 13614.4 | learning rate: 9.453E-06 | global batch size:    16 | lm loss: 7.057775E+00 | loss scale: 16384.0 | grad norm: 131631.194 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2133/  159576 | consumed samples:        34128 | elapsed time per iteration (ms): 13899.1 | learning rate: 9.457E-06 | global batch size:    16 | lm loss: 7.006071E+00 | loss scale: 16384.0 | grad norm: 88510.853 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2134/  159576 | consumed samples:        34144 | elapsed time per iteration (ms): 13637.7 | learning rate: 9.462E-06 | global batch size:    16 | lm loss: 7.062113E+00 | loss scale: 16384.0 | grad norm: 75449.578 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2135/  159576 | consumed samples:        34160 | elapsed time per iteration (ms): 13602.2 | learning rate: 9.466E-06 | global batch size:    16 | lm loss: 7.078564E+00 | loss scale: 16384.0 | grad norm: 130110.649 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2136/  159576 | consumed samples:        34176 | elapsed time per iteration (ms): 13592.0 | learning rate: 9.470E-06 | global batch size:    16 | lm loss: 6.814717E+00 | loss scale: 16384.0 | grad norm: 149407.279 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2137/  159576 | consumed samples:        34192 | elapsed time per iteration (ms): 14082.9 | learning rate: 9.475E-06 | global batch size:    16 | lm loss: 6.978102E+00 | loss scale: 16384.0 | grad norm: 53919.012 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2138/  159576 | consumed samples:        34208 | elapsed time per iteration (ms): 13782.2 | learning rate: 9.479E-06 | global batch size:    16 | lm loss: 6.799563E+00 | loss scale: 16384.0 | grad norm: 71961.337 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2139/  159576 | consumed samples:        34224 | elapsed time per iteration (ms): 13617.0 | learning rate: 9.484E-06 | global batch size:    16 | lm loss: 6.855867E+00 | loss scale: 16384.0 | grad norm: 59818.367 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2140/  159576 | consumed samples:        34240 | elapsed time per iteration (ms): 13639.2 | learning rate: 9.488E-06 | global batch size:    16 | lm loss: 6.902345E+00 | loss scale: 16384.0 | grad norm: 58890.649 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2141/  159576 | consumed samples:        34256 | elapsed time per iteration (ms): 13987.1 | learning rate: 9.493E-06 | global batch size:    16 | lm loss: 6.755795E+00 | loss scale: 16384.0 | grad norm: 77002.230 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2142/  159576 | consumed samples:        34272 | elapsed time per iteration (ms): 13630.0 | learning rate: 9.497E-06 | global batch size:    16 | lm loss: 6.875304E+00 | loss scale: 16384.0 | grad norm: 67923.163 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2143/  159576 | consumed samples:        34288 | elapsed time per iteration (ms): 13550.6 | learning rate: 9.501E-06 | global batch size:    16 | lm loss: 6.950579E+00 | loss scale: 16384.0 | grad norm: 177721.329 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2144/  159576 | consumed samples:        34304 | elapsed time per iteration (ms): 13618.0 | learning rate: 9.506E-06 | global batch size:    16 | lm loss: 6.968021E+00 | loss scale: 16384.0 | grad norm: 116784.963 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2145/  159576 | consumed samples:        34320 | elapsed time per iteration (ms): 13676.0 | learning rate: 9.510E-06 | global batch size:    16 | lm loss: 6.878886E+00 | loss scale: 16384.0 | grad norm: 69612.138 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2146/  159576 | consumed samples:        34336 | elapsed time per iteration (ms): 13771.3 | learning rate: 9.515E-06 | global batch size:    16 | lm loss: 6.903853E+00 | loss scale: 16384.0 | grad norm: 80623.990 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2147/  159576 | consumed samples:        34352 | elapsed time per iteration (ms): 13687.5 | learning rate: 9.519E-06 | global batch size:    16 | lm loss: 6.992352E+00 | loss scale: 16384.0 | grad norm: 50990.170 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2148/  159576 | consumed samples:        34368 | elapsed time per iteration (ms): 13681.5 | learning rate: 9.524E-06 | global batch size:    16 | lm loss: 6.979048E+00 | loss scale: 16384.0 | grad norm: 120685.818 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2149/  159576 | consumed samples:        34384 | elapsed time per iteration (ms): 13585.6 | learning rate: 9.528E-06 | global batch size:    16 | lm loss: 6.962264E+00 | loss scale: 16384.0 | grad norm: 95096.210 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2150/  159576 | consumed samples:        34400 | elapsed time per iteration (ms): 13964.4 | learning rate: 9.533E-06 | global batch size:    16 | lm loss: 7.070148E+00 | loss scale: 16384.0 | grad norm: 102834.582 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2151/  159576 | consumed samples:        34416 | elapsed time per iteration (ms): 13597.2 | learning rate: 9.537E-06 | global batch size:    16 | lm loss: 6.998973E+00 | loss scale: 16384.0 | grad norm: 66036.970 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2152/  159576 | consumed samples:        34432 | elapsed time per iteration (ms): 13608.8 | learning rate: 9.541E-06 | global batch size:    16 | lm loss: 6.972906E+00 | loss scale: 16384.0 | grad norm: 85292.027 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2153/  159576 | consumed samples:        34448 | elapsed time per iteration (ms): 13623.2 | learning rate: 9.546E-06 | global batch size:    16 | lm loss: 6.755056E+00 | loss scale: 16384.0 | grad norm: 76762.492 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2154/  159576 | consumed samples:        34464 | elapsed time per iteration (ms): 13956.2 | learning rate: 9.550E-06 | global batch size:    16 | lm loss: 7.015395E+00 | loss scale: 16384.0 | grad norm: 90062.733 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2155/  159576 | consumed samples:        34480 | elapsed time per iteration (ms): 13759.1 | learning rate: 9.555E-06 | global batch size:    16 | lm loss: 6.815333E+00 | loss scale: 16384.0 | grad norm: 68441.221 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2156/  159576 | consumed samples:        34496 | elapsed time per iteration (ms): 13580.0 | learning rate: 9.559E-06 | global batch size:    16 | lm loss: 6.783628E+00 | loss scale: 16384.0 | grad norm: 110716.577 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2157/  159576 | consumed samples:        34512 | elapsed time per iteration (ms): 13582.3 | learning rate: 9.564E-06 | global batch size:    16 | lm loss: 7.064082E+00 | loss scale: 16384.0 | grad norm: 62285.534 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2158/  159576 | consumed samples:        34528 | elapsed time per iteration (ms): 13596.2 | learning rate: 9.568E-06 | global batch size:    16 | lm loss: 7.092577E+00 | loss scale: 16384.0 | grad norm: 69925.096 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2159/  159576 | consumed samples:        34544 | elapsed time per iteration (ms): 13966.6 | learning rate: 9.572E-06 | global batch size:    16 | lm loss: 7.030209E+00 | loss scale: 16384.0 | grad norm: 74908.048 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2160/  159576 | consumed samples:        34560 | elapsed time per iteration (ms): 13608.2 | learning rate: 9.577E-06 | global batch size:    16 | lm loss: 6.985407E+00 | loss scale: 16384.0 | grad norm: 107105.025 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2161/  159576 | consumed samples:        34576 | elapsed time per iteration (ms): 13591.8 | learning rate: 9.581E-06 | global batch size:    16 | lm loss: 6.846824E+00 | loss scale: 16384.0 | grad norm: 59511.297 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2162/  159576 | consumed samples:        34592 | elapsed time per iteration (ms): 13686.7 | learning rate: 9.586E-06 | global batch size:    16 | lm loss: 6.984041E+00 | loss scale: 16384.0 | grad norm: 81334.026 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2163/  159576 | consumed samples:        34608 | elapsed time per iteration (ms): 13937.5 | learning rate: 9.590E-06 | global batch size:    16 | lm loss: 7.022871E+00 | loss scale: 16384.0 | grad norm: 84185.459 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2164/  159576 | consumed samples:        34624 | elapsed time per iteration (ms): 13577.7 | learning rate: 9.595E-06 | global batch size:    16 | lm loss: 7.029066E+00 | loss scale: 16384.0 | grad norm: 47624.311 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2165/  159576 | consumed samples:        34640 | elapsed time per iteration (ms): 13595.6 | learning rate: 9.599E-06 | global batch size:    16 | lm loss: 6.822045E+00 | loss scale: 16384.0 | grad norm: 138589.166 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2166/  159576 | consumed samples:        34656 | elapsed time per iteration (ms): 13704.6 | learning rate: 9.604E-06 | global batch size:    16 | lm loss: 6.980874E+00 | loss scale: 16384.0 | grad norm: 80500.034 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2167/  159576 | consumed samples:        34672 | elapsed time per iteration (ms): 13517.8 | learning rate: 9.608E-06 | global batch size:    16 | lm loss: 7.052095E+00 | loss scale: 16384.0 | grad norm: 68630.752 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2168/  159576 | consumed samples:        34688 | elapsed time per iteration (ms): 13832.6 | learning rate: 9.612E-06 | global batch size:    16 | lm loss: 7.172165E+00 | loss scale: 16384.0 | grad norm: 59001.711 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2169/  159576 | consumed samples:        34704 | elapsed time per iteration (ms): 13681.3 | learning rate: 9.617E-06 | global batch size:    16 | lm loss: 7.068394E+00 | loss scale: 16384.0 | grad norm: 73598.207 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2170/  159576 | consumed samples:        34720 | elapsed time per iteration (ms): 13669.0 | learning rate: 9.621E-06 | global batch size:    16 | lm loss: 6.842896E+00 | loss scale: 16384.0 | grad norm: 62440.681 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2171/  159576 | consumed samples:        34736 | elapsed time per iteration (ms): 13648.5 | learning rate: 9.626E-06 | global batch size:    16 | lm loss: 7.126867E+00 | loss scale: 16384.0 | grad norm: 155364.353 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2172/  159576 | consumed samples:        34752 | elapsed time per iteration (ms): 14078.1 | learning rate: 9.630E-06 | global batch size:    16 | lm loss: 7.047744E+00 | loss scale: 16384.0 | grad norm: 113473.385 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2173/  159576 | consumed samples:        34768 | elapsed time per iteration (ms): 13680.5 | learning rate: 9.635E-06 | global batch size:    16 | lm loss: 7.016094E+00 | loss scale: 16384.0 | grad norm: 73489.301 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2174/  159576 | consumed samples:        34784 | elapsed time per iteration (ms): 13666.0 | learning rate: 9.639E-06 | global batch size:    16 | lm loss: 7.061403E+00 | loss scale: 16384.0 | grad norm: 75521.374 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2175/  159576 | consumed samples:        34800 | elapsed time per iteration (ms): 13610.4 | learning rate: 9.643E-06 | global batch size:    16 | lm loss: 7.042882E+00 | loss scale: 16384.0 | grad norm: 95300.955 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2176/  159576 | consumed samples:        34816 | elapsed time per iteration (ms): 14108.9 | learning rate: 9.648E-06 | global batch size:    16 | lm loss: 6.915576E+00 | loss scale: 16384.0 | grad norm: 74751.665 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2177/  159576 | consumed samples:        34832 | elapsed time per iteration (ms): 13643.1 | learning rate: 9.652E-06 | global batch size:    16 | lm loss: 6.979721E+00 | loss scale: 16384.0 | grad norm: 71252.622 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2178/  159576 | consumed samples:        34848 | elapsed time per iteration (ms): 13642.9 | learning rate: 9.657E-06 | global batch size:    16 | lm loss: 6.816618E+00 | loss scale: 16384.0 | grad norm: 60039.955 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2179/  159576 | consumed samples:        34864 | elapsed time per iteration (ms): 13628.9 | learning rate: 9.661E-06 | global batch size:    16 | lm loss: 7.054741E+00 | loss scale: 16384.0 | grad norm: 196305.881 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2180/  159576 | consumed samples:        34880 | elapsed time per iteration (ms): 13588.5 | learning rate: 9.666E-06 | global batch size:    16 | lm loss: 6.953914E+00 | loss scale: 16384.0 | grad norm: 120715.141 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2181/  159576 | consumed samples:        34896 | elapsed time per iteration (ms): 13968.3 | learning rate: 9.670E-06 | global batch size:    16 | lm loss: 7.034101E+00 | loss scale: 16384.0 | grad norm: 81756.186 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2182/  159576 | consumed samples:        34912 | elapsed time per iteration (ms): 13658.7 | learning rate: 9.675E-06 | global batch size:    16 | lm loss: 6.787637E+00 | loss scale: 16384.0 | grad norm: 99431.755 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2183/  159576 | consumed samples:        34928 | elapsed time per iteration (ms): 13669.1 | learning rate: 9.679E-06 | global batch size:    16 | lm loss: 6.894065E+00 | loss scale: 16384.0 | grad norm: 83400.667 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2184/  159576 | consumed samples:        34944 | elapsed time per iteration (ms): 13649.9 | learning rate: 9.683E-06 | global batch size:    16 | lm loss: 6.871455E+00 | loss scale: 16384.0 | grad norm: 159204.546 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2185/  159576 | consumed samples:        34960 | elapsed time per iteration (ms): 14059.0 | learning rate: 9.688E-06 | global batch size:    16 | lm loss: 6.954823E+00 | loss scale: 16384.0 | grad norm: 106187.044 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2186/  159576 | consumed samples:        34976 | elapsed time per iteration (ms): 13651.8 | learning rate: 9.692E-06 | global batch size:    16 | lm loss: 7.198211E+00 | loss scale: 16384.0 | grad norm: 95306.127 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2187/  159576 | consumed samples:        34992 | elapsed time per iteration (ms): 13612.8 | learning rate: 9.697E-06 | global batch size:    16 | lm loss: 7.037758E+00 | loss scale: 16384.0 | grad norm: 86743.620 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2188/  159576 | consumed samples:        35008 | elapsed time per iteration (ms): 13616.1 | learning rate: 9.701E-06 | global batch size:    16 | lm loss: 6.780216E+00 | loss scale: 16384.0 | grad norm: 66759.645 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2189/  159576 | consumed samples:        35024 | elapsed time per iteration (ms): 13935.4 | learning rate: 9.706E-06 | global batch size:    16 | lm loss: 7.134370E+00 | loss scale: 16384.0 | grad norm: 224387.512 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2190/  159576 | consumed samples:        35040 | elapsed time per iteration (ms): 13796.3 | learning rate: 9.710E-06 | global batch size:    16 | lm loss: 6.830962E+00 | loss scale: 16384.0 | grad norm: 184503.407 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2191/  159576 | consumed samples:        35056 | elapsed time per iteration (ms): 13596.6 | learning rate: 9.714E-06 | global batch size:    16 | lm loss: 7.006136E+00 | loss scale: 16384.0 | grad norm: 105791.757 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2192/  159576 | consumed samples:        35072 | elapsed time per iteration (ms): 13632.0 | learning rate: 9.719E-06 | global batch size:    16 | lm loss: 7.023957E+00 | loss scale: 16384.0 | grad norm: 128317.920 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2193/  159576 | consumed samples:        35088 | elapsed time per iteration (ms): 13700.7 | learning rate: 9.723E-06 | global batch size:    16 | lm loss: 6.920637E+00 | loss scale: 16384.0 | grad norm: 90884.730 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2194/  159576 | consumed samples:        35104 | elapsed time per iteration (ms): 13995.7 | learning rate: 9.728E-06 | global batch size:    16 | lm loss: 7.240769E+00 | loss scale: 16384.0 | grad norm: 157352.501 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2195/  159576 | consumed samples:        35120 | elapsed time per iteration (ms): 13669.4 | learning rate: 9.732E-06 | global batch size:    16 | lm loss: 6.780205E+00 | loss scale: 16384.0 | grad norm: 106455.602 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2196/  159576 | consumed samples:        35136 | elapsed time per iteration (ms): 13670.0 | learning rate: 9.737E-06 | global batch size:    16 | lm loss: 6.778285E+00 | loss scale: 16384.0 | grad norm: 86879.374 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2197/  159576 | consumed samples:        35152 | elapsed time per iteration (ms): 13661.3 | learning rate: 9.741E-06 | global batch size:    16 | lm loss: 7.030122E+00 | loss scale: 16384.0 | grad norm: 93377.129 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2198/  159576 | consumed samples:        35168 | elapsed time per iteration (ms): 13923.4 | learning rate: 9.746E-06 | global batch size:    16 | lm loss: 6.727036E+00 | loss scale: 16384.0 | grad norm: 148918.392 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2199/  159576 | consumed samples:        35184 | elapsed time per iteration (ms): 13675.4 | learning rate: 9.750E-06 | global batch size:    16 | lm loss: 7.104040E+00 | loss scale: 16384.0 | grad norm: 135532.675 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2200/  159576 | consumed samples:        35200 | elapsed time per iteration (ms): 13739.5 | learning rate: 9.754E-06 | global batch size:    16 | lm loss: 6.969880E+00 | loss scale: 16384.0 | grad norm: 96195.135 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2201/  159576 | consumed samples:        35216 | elapsed time per iteration (ms): 13703.1 | learning rate: 9.759E-06 | global batch size:    16 | lm loss: 7.123239E+00 | loss scale: 16384.0 | grad norm: 89259.239 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2202/  159576 | consumed samples:        35232 | elapsed time per iteration (ms): 13665.4 | learning rate: 9.763E-06 | global batch size:    16 | lm loss: 6.652438E+00 | loss scale: 16384.0 | grad norm: 70165.954 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2203/  159576 | consumed samples:        35248 | elapsed time per iteration (ms): 13954.1 | learning rate: 9.768E-06 | global batch size:    16 | lm loss: 6.943371E+00 | loss scale: 16384.0 | grad norm: 138696.234 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2204/  159576 | consumed samples:        35264 | elapsed time per iteration (ms): 13604.7 | learning rate: 9.772E-06 | global batch size:    16 | lm loss: 6.743501E+00 | loss scale: 16384.0 | grad norm: 190526.042 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2205/  159576 | consumed samples:        35280 | elapsed time per iteration (ms): 13626.5 | learning rate: 9.777E-06 | global batch size:    16 | lm loss: 6.968715E+00 | loss scale: 16384.0 | grad norm: 97137.923 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2206/  159576 | consumed samples:        35296 | elapsed time per iteration (ms): 13767.5 | learning rate: 9.781E-06 | global batch size:    16 | lm loss: 6.911567E+00 | loss scale: 16384.0 | grad norm: 68778.743 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2207/  159576 | consumed samples:        35312 | elapsed time per iteration (ms): 14159.2 | learning rate: 9.786E-06 | global batch size:    16 | lm loss: 7.117369E+00 | loss scale: 16384.0 | grad norm: 70066.331 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2208/  159576 | consumed samples:        35328 | elapsed time per iteration (ms): 13832.5 | learning rate: 9.790E-06 | global batch size:    16 | lm loss: 7.121370E+00 | loss scale: 16384.0 | grad norm: 98891.631 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2209/  159576 | consumed samples:        35344 | elapsed time per iteration (ms): 13749.3 | learning rate: 9.794E-06 | global batch size:    16 | lm loss: 6.873634E+00 | loss scale: 16384.0 | grad norm: 61060.289 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2210/  159576 | consumed samples:        35360 | elapsed time per iteration (ms): 13710.7 | learning rate: 9.799E-06 | global batch size:    16 | lm loss: 6.761906E+00 | loss scale: 16384.0 | grad norm: 87340.173 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2211/  159576 | consumed samples:        35376 | elapsed time per iteration (ms): 14073.4 | learning rate: 9.803E-06 | global batch size:    16 | lm loss: 6.896225E+00 | loss scale: 16384.0 | grad norm: 67623.817 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2212/  159576 | consumed samples:        35392 | elapsed time per iteration (ms): 13676.6 | learning rate: 9.808E-06 | global batch size:    16 | lm loss: 6.925282E+00 | loss scale: 16384.0 | grad norm: 112986.049 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2213/  159576 | consumed samples:        35408 | elapsed time per iteration (ms): 13682.0 | learning rate: 9.812E-06 | global batch size:    16 | lm loss: 6.932837E+00 | loss scale: 16384.0 | grad norm: 72538.119 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2214/  159576 | consumed samples:        35424 | elapsed time per iteration (ms): 13773.0 | learning rate: 9.817E-06 | global batch size:    16 | lm loss: 6.751261E+00 | loss scale: 16384.0 | grad norm: 110253.980 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2215/  159576 | consumed samples:        35440 | elapsed time per iteration (ms): 13688.8 | learning rate: 9.821E-06 | global batch size:    16 | lm loss: 6.953260E+00 | loss scale: 16384.0 | grad norm: 85951.671 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2216/  159576 | consumed samples:        35456 | elapsed time per iteration (ms): 13877.0 | learning rate: 9.825E-06 | global batch size:    16 | lm loss: 6.963014E+00 | loss scale: 16384.0 | grad norm: 78883.228 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2217/  159576 | consumed samples:        35472 | elapsed time per iteration (ms): 13727.8 | learning rate: 9.830E-06 | global batch size:    16 | lm loss: 6.840832E+00 | loss scale: 16384.0 | grad norm: 92435.156 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2218/  159576 | consumed samples:        35488 | elapsed time per iteration (ms): 13750.4 | learning rate: 9.834E-06 | global batch size:    16 | lm loss: 6.949021E+00 | loss scale: 16384.0 | grad norm: 60313.225 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2219/  159576 | consumed samples:        35504 | elapsed time per iteration (ms): 13607.8 | learning rate: 9.839E-06 | global batch size:    16 | lm loss: 6.950431E+00 | loss scale: 16384.0 | grad norm: 92434.517 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2220/  159576 | consumed samples:        35520 | elapsed time per iteration (ms): 14159.9 | learning rate: 9.843E-06 | global batch size:    16 | lm loss: 7.318023E+00 | loss scale: 16384.0 | grad norm: 75178.025 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2221/  159576 | consumed samples:        35536 | elapsed time per iteration (ms): 13828.1 | learning rate: 9.848E-06 | global batch size:    16 | lm loss: 6.425551E+00 | loss scale: 16384.0 | grad norm: 66904.070 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2222/  159576 | consumed samples:        35552 | elapsed time per iteration (ms): 13669.2 | learning rate: 9.852E-06 | global batch size:    16 | lm loss: 7.016433E+00 | loss scale: 16384.0 | grad norm: 48549.102 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2223/  159576 | consumed samples:        35568 | elapsed time per iteration (ms): 13705.5 | learning rate: 9.857E-06 | global batch size:    16 | lm loss: 7.026052E+00 | loss scale: 16384.0 | grad norm: 87253.670 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2224/  159576 | consumed samples:        35584 | elapsed time per iteration (ms): 14141.1 | learning rate: 9.861E-06 | global batch size:    16 | lm loss: 7.019730E+00 | loss scale: 16384.0 | grad norm: 75100.959 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2225/  159576 | consumed samples:        35600 | elapsed time per iteration (ms): 13696.3 | learning rate: 9.865E-06 | global batch size:    16 | lm loss: 6.750052E+00 | loss scale: 16384.0 | grad norm: 72544.618 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2226/  159576 | consumed samples:        35616 | elapsed time per iteration (ms): 13659.8 | learning rate: 9.870E-06 | global batch size:    16 | lm loss: 6.815751E+00 | loss scale: 16384.0 | grad norm: 76403.248 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2227/  159576 | consumed samples:        35632 | elapsed time per iteration (ms): 13696.5 | learning rate: 9.874E-06 | global batch size:    16 | lm loss: 6.716208E+00 | loss scale: 16384.0 | grad norm: 70565.479 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2228/  159576 | consumed samples:        35648 | elapsed time per iteration (ms): 13652.7 | learning rate: 9.879E-06 | global batch size:    16 | lm loss: 6.902302E+00 | loss scale: 16384.0 | grad norm: 99921.530 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2229/  159576 | consumed samples:        35664 | elapsed time per iteration (ms): 13754.5 | learning rate: 9.883E-06 | global batch size:    16 | lm loss: 6.941592E+00 | loss scale: 16384.0 | grad norm: 77045.459 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2230/  159576 | consumed samples:        35680 | elapsed time per iteration (ms): 13726.8 | learning rate: 9.888E-06 | global batch size:    16 | lm loss: 7.006780E+00 | loss scale: 16384.0 | grad norm: 79594.378 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2231/  159576 | consumed samples:        35696 | elapsed time per iteration (ms): 13704.0 | learning rate: 9.892E-06 | global batch size:    16 | lm loss: 7.056840E+00 | loss scale: 16384.0 | grad norm: 72251.485 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2232/  159576 | consumed samples:        35712 | elapsed time per iteration (ms): 13646.8 | learning rate: 9.896E-06 | global batch size:    16 | lm loss: 6.913527E+00 | loss scale: 16384.0 | grad norm: 58442.793 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2233/  159576 | consumed samples:        35728 | elapsed time per iteration (ms): 14009.0 | learning rate: 9.901E-06 | global batch size:    16 | lm loss: 6.865626E+00 | loss scale: 16384.0 | grad norm: 73447.631 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2234/  159576 | consumed samples:        35744 | elapsed time per iteration (ms): 13550.7 | learning rate: 9.905E-06 | global batch size:    16 | lm loss: 6.954779E+00 | loss scale: 16384.0 | grad norm: 63007.809 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2235/  159576 | consumed samples:        35760 | elapsed time per iteration (ms): 13638.3 | learning rate: 9.910E-06 | global batch size:    16 | lm loss: 6.917772E+00 | loss scale: 16384.0 | grad norm: 73029.876 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2236/  159576 | consumed samples:        35776 | elapsed time per iteration (ms): 13495.6 | learning rate: 9.914E-06 | global batch size:    16 | lm loss: 6.899360E+00 | loss scale: 16384.0 | grad norm: 58524.994 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2237/  159576 | consumed samples:        35792 | elapsed time per iteration (ms): 13933.0 | learning rate: 9.919E-06 | global batch size:    16 | lm loss: 6.898277E+00 | loss scale: 16384.0 | grad norm: 89250.802 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2238/  159576 | consumed samples:        35808 | elapsed time per iteration (ms): 13906.4 | learning rate: 9.923E-06 | global batch size:    16 | lm loss: 6.863415E+00 | loss scale: 16384.0 | grad norm: 57965.777 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2239/  159576 | consumed samples:        35824 | elapsed time per iteration (ms): 13638.8 | learning rate: 9.928E-06 | global batch size:    16 | lm loss: 6.994671E+00 | loss scale: 16384.0 | grad norm: 102232.968 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2240/  159576 | consumed samples:        35840 | elapsed time per iteration (ms): 13621.9 | learning rate: 9.932E-06 | global batch size:    16 | lm loss: 6.956360E+00 | loss scale: 16384.0 | grad norm: 69904.385 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2241/  159576 | consumed samples:        35856 | elapsed time per iteration (ms): 13633.2 | learning rate: 9.936E-06 | global batch size:    16 | lm loss: 6.939447E+00 | loss scale: 16384.0 | grad norm: 95578.290 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2242/  159576 | consumed samples:        35872 | elapsed time per iteration (ms): 13726.4 | learning rate: 9.941E-06 | global batch size:    16 | lm loss: 7.046509E+00 | loss scale: 16384.0 | grad norm: 82383.239 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2243/  159576 | consumed samples:        35888 | elapsed time per iteration (ms): 13506.7 | learning rate: 9.945E-06 | global batch size:    16 | lm loss: 7.151508E+00 | loss scale: 16384.0 | grad norm: 98476.196 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2244/  159576 | consumed samples:        35904 | elapsed time per iteration (ms): 13568.6 | learning rate: 9.950E-06 | global batch size:    16 | lm loss: 6.872870E+00 | loss scale: 16384.0 | grad norm: 74912.305 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2245/  159576 | consumed samples:        35920 | elapsed time per iteration (ms): 13602.7 | learning rate: 9.954E-06 | global batch size:    16 | lm loss: 6.673596E+00 | loss scale: 16384.0 | grad norm: 76531.716 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2246/  159576 | consumed samples:        35936 | elapsed time per iteration (ms): 14093.3 | learning rate: 9.959E-06 | global batch size:    16 | lm loss: 6.910951E+00 | loss scale: 16384.0 | grad norm: 90155.766 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2247/  159576 | consumed samples:        35952 | elapsed time per iteration (ms): 13495.1 | learning rate: 9.963E-06 | global batch size:    16 | lm loss: 6.761725E+00 | loss scale: 16384.0 | grad norm: 71637.396 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2248/  159576 | consumed samples:        35968 | elapsed time per iteration (ms): 13629.2 | learning rate: 9.967E-06 | global batch size:    16 | lm loss: 6.898269E+00 | loss scale: 16384.0 | grad norm: 99310.370 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2249/  159576 | consumed samples:        35984 | elapsed time per iteration (ms): 13535.5 | learning rate: 9.972E-06 | global batch size:    16 | lm loss: 6.917497E+00 | loss scale: 16384.0 | grad norm: 74932.151 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2250/  159576 | consumed samples:        36000 | elapsed time per iteration (ms): 13554.8 | learning rate: 9.976E-06 | global batch size:    16 | lm loss: 6.728826E+00 | loss scale: 16384.0 | grad norm: 73535.130 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2251/  159576 | consumed samples:        36016 | elapsed time per iteration (ms): 13742.7 | learning rate: 9.981E-06 | global batch size:    16 | lm loss: 6.901268E+00 | loss scale: 16384.0 | grad norm: 76822.648 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2252/  159576 | consumed samples:        36032 | elapsed time per iteration (ms): 13586.6 | learning rate: 9.985E-06 | global batch size:    16 | lm loss: 6.964120E+00 | loss scale: 16384.0 | grad norm: 47563.266 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2253/  159576 | consumed samples:        36048 | elapsed time per iteration (ms): 13621.0 | learning rate: 9.990E-06 | global batch size:    16 | lm loss: 6.976019E+00 | loss scale: 16384.0 | grad norm: 84584.649 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2254/  159576 | consumed samples:        36064 | elapsed time per iteration (ms): 13682.5 | learning rate: 9.994E-06 | global batch size:    16 | lm loss: 6.875343E+00 | loss scale: 16384.0 | grad norm: 37745.320 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2255/  159576 | consumed samples:        36080 | elapsed time per iteration (ms): 14145.6 | learning rate: 9.999E-06 | global batch size:    16 | lm loss: 6.934249E+00 | loss scale: 16384.0 | grad norm: 136584.539 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2256/  159576 | consumed samples:        36096 | elapsed time per iteration (ms): 13651.1 | learning rate: 1.000E-05 | global batch size:    16 | lm loss: 6.785090E+00 | loss scale: 16384.0 | grad norm: 79752.112 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2257/  159576 | consumed samples:        36112 | elapsed time per iteration (ms): 13492.4 | learning rate: 1.001E-05 | global batch size:    16 | lm loss: 6.860191E+00 | loss scale: 16384.0 | grad norm: 66550.522 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2258/  159576 | consumed samples:        36128 | elapsed time per iteration (ms): 13560.5 | learning rate: 1.001E-05 | global batch size:    16 | lm loss: 6.910413E+00 | loss scale: 16384.0 | grad norm: 67569.003 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2259/  159576 | consumed samples:        36144 | elapsed time per iteration (ms): 14039.9 | learning rate: 1.002E-05 | global batch size:    16 | lm loss: 7.188947E+00 | loss scale: 16384.0 | grad norm: 73452.334 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2260/  159576 | consumed samples:        36160 | elapsed time per iteration (ms): 13575.5 | learning rate: 1.002E-05 | global batch size:    16 | lm loss: 6.873131E+00 | loss scale: 16384.0 | grad norm: 111867.072 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2261/  159576 | consumed samples:        36176 | elapsed time per iteration (ms): 13638.2 | learning rate: 1.003E-05 | global batch size:    16 | lm loss: 6.838548E+00 | loss scale: 16384.0 | grad norm: 80423.624 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2262/  159576 | consumed samples:        36192 | elapsed time per iteration (ms): 13658.9 | learning rate: 1.003E-05 | global batch size:    16 | lm loss: 7.019104E+00 | loss scale: 16384.0 | grad norm: 84663.314 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2263/  159576 | consumed samples:        36208 | elapsed time per iteration (ms): 13616.1 | learning rate: 1.003E-05 | global batch size:    16 | lm loss: 6.917726E+00 | loss scale: 16384.0 | grad norm: 79078.388 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2264/  159576 | consumed samples:        36224 | elapsed time per iteration (ms): 13773.7 | learning rate: 1.004E-05 | global batch size:    16 | lm loss: 7.129383E+00 | loss scale: 16384.0 | grad norm: 84356.561 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2265/  159576 | consumed samples:        36240 | elapsed time per iteration (ms): 13599.9 | learning rate: 1.004E-05 | global batch size:    16 | lm loss: 6.950484E+00 | loss scale: 16384.0 | grad norm: 96317.698 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2266/  159576 | consumed samples:        36256 | elapsed time per iteration (ms): 13555.3 | learning rate: 1.005E-05 | global batch size:    16 | lm loss: 6.983542E+00 | loss scale: 16384.0 | grad norm: 87963.519 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2267/  159576 | consumed samples:        36272 | elapsed time per iteration (ms): 13615.4 | learning rate: 1.005E-05 | global batch size:    16 | lm loss: 7.106489E+00 | loss scale: 16384.0 | grad norm: 49938.774 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2268/  159576 | consumed samples:        36288 | elapsed time per iteration (ms): 13987.6 | learning rate: 1.006E-05 | global batch size:    16 | lm loss: 6.957284E+00 | loss scale: 16384.0 | grad norm: 80083.213 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2269/  159576 | consumed samples:        36304 | elapsed time per iteration (ms): 13613.8 | learning rate: 1.006E-05 | global batch size:    16 | lm loss: 6.895617E+00 | loss scale: 16384.0 | grad norm: 89537.779 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2270/  159576 | consumed samples:        36320 | elapsed time per iteration (ms): 13747.0 | learning rate: 1.007E-05 | global batch size:    16 | lm loss: 6.945907E+00 | loss scale: 16384.0 | grad norm: 109400.041 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2271/  159576 | consumed samples:        36336 | elapsed time per iteration (ms): 13527.2 | learning rate: 1.007E-05 | global batch size:    16 | lm loss: 6.928704E+00 | loss scale: 16384.0 | grad norm: 78576.596 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2272/  159576 | consumed samples:        36352 | elapsed time per iteration (ms): 13615.1 | learning rate: 1.007E-05 | global batch size:    16 | lm loss: 7.229642E+00 | loss scale: 16384.0 | grad norm: 80535.103 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2273/  159576 | consumed samples:        36368 | elapsed time per iteration (ms): 13960.2 | learning rate: 1.008E-05 | global batch size:    16 | lm loss: 6.896622E+00 | loss scale: 16384.0 | grad norm: 65043.229 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2274/  159576 | consumed samples:        36384 | elapsed time per iteration (ms): 13538.8 | learning rate: 1.008E-05 | global batch size:    16 | lm loss: 7.013526E+00 | loss scale: 16384.0 | grad norm: 78284.375 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2275/  159576 | consumed samples:        36400 | elapsed time per iteration (ms): 13634.5 | learning rate: 1.009E-05 | global batch size:    16 | lm loss: 6.912004E+00 | loss scale: 16384.0 | grad norm: 66988.185 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2276/  159576 | consumed samples:        36416 | elapsed time per iteration (ms): 13609.6 | learning rate: 1.009E-05 | global batch size:    16 | lm loss: 6.759723E+00 | loss scale: 16384.0 | grad norm: 69630.646 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2277/  159576 | consumed samples:        36432 | elapsed time per iteration (ms): 14096.5 | learning rate: 1.010E-05 | global batch size:    16 | lm loss: 7.025202E+00 | loss scale: 16384.0 | grad norm: 66059.779 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2278/  159576 | consumed samples:        36448 | elapsed time per iteration (ms): 13743.0 | learning rate: 1.010E-05 | global batch size:    16 | lm loss: 6.957587E+00 | loss scale: 16384.0 | grad norm: 80177.800 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2279/  159576 | consumed samples:        36464 | elapsed time per iteration (ms): 13675.0 | learning rate: 1.011E-05 | global batch size:    16 | lm loss: 6.897773E+00 | loss scale: 16384.0 | grad norm: 50160.689 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2280/  159576 | consumed samples:        36480 | elapsed time per iteration (ms): 13581.6 | learning rate: 1.011E-05 | global batch size:    16 | lm loss: 6.697253E+00 | loss scale: 16384.0 | grad norm: 64483.166 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2281/  159576 | consumed samples:        36496 | elapsed time per iteration (ms): 13961.5 | learning rate: 1.011E-05 | global batch size:    16 | lm loss: 6.944922E+00 | loss scale: 16384.0 | grad norm: 67869.220 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2282/  159576 | consumed samples:        36512 | elapsed time per iteration (ms): 13505.0 | learning rate: 1.012E-05 | global batch size:    16 | lm loss: 6.410736E+00 | loss scale: 16384.0 | grad norm: 49766.856 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2283/  159576 | consumed samples:        36528 | elapsed time per iteration (ms): 13611.4 | learning rate: 1.012E-05 | global batch size:    16 | lm loss: 6.772882E+00 | loss scale: 16384.0 | grad norm: 59961.718 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2284/  159576 | consumed samples:        36544 | elapsed time per iteration (ms): 13596.5 | learning rate: 1.013E-05 | global batch size:    16 | lm loss: 6.794603E+00 | loss scale: 16384.0 | grad norm: 68562.920 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2285/  159576 | consumed samples:        36560 | elapsed time per iteration (ms): 13567.2 | learning rate: 1.013E-05 | global batch size:    16 | lm loss: 7.113194E+00 | loss scale: 16384.0 | grad norm: 59728.136 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2286/  159576 | consumed samples:        36576 | elapsed time per iteration (ms): 13847.6 | learning rate: 1.014E-05 | global batch size:    16 | lm loss: 6.799785E+00 | loss scale: 16384.0 | grad norm: 76247.046 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2287/  159576 | consumed samples:        36592 | elapsed time per iteration (ms): 13611.9 | learning rate: 1.014E-05 | global batch size:    16 | lm loss: 7.034187E+00 | loss scale: 16384.0 | grad norm: 50151.578 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2288/  159576 | consumed samples:        36608 | elapsed time per iteration (ms): 13533.2 | learning rate: 1.014E-05 | global batch size:    16 | lm loss: 6.881348E+00 | loss scale: 16384.0 | grad norm: 130377.193 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2289/  159576 | consumed samples:        36624 | elapsed time per iteration (ms): 13525.7 | learning rate: 1.015E-05 | global batch size:    16 | lm loss: 6.952589E+00 | loss scale: 16384.0 | grad norm: 68434.169 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2290/  159576 | consumed samples:        36640 | elapsed time per iteration (ms): 13963.1 | learning rate: 1.015E-05 | global batch size:    16 | lm loss: 6.887176E+00 | loss scale: 16384.0 | grad norm: 89636.101 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2291/  159576 | consumed samples:        36656 | elapsed time per iteration (ms): 13620.5 | learning rate: 1.016E-05 | global batch size:    16 | lm loss: 6.846462E+00 | loss scale: 16384.0 | grad norm: 73199.296 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2292/  159576 | consumed samples:        36672 | elapsed time per iteration (ms): 13656.0 | learning rate: 1.016E-05 | global batch size:    16 | lm loss: 7.302676E+00 | loss scale: 16384.0 | grad norm: 174677.987 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2293/  159576 | consumed samples:        36688 | elapsed time per iteration (ms): 13714.2 | learning rate: 1.017E-05 | global batch size:    16 | lm loss: 7.151010E+00 | loss scale: 16384.0 | grad norm: 135612.210 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2294/  159576 | consumed samples:        36704 | elapsed time per iteration (ms): 13919.9 | learning rate: 1.017E-05 | global batch size:    16 | lm loss: 7.005547E+00 | loss scale: 16384.0 | grad norm: 89084.825 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2295/  159576 | consumed samples:        36720 | elapsed time per iteration (ms): 13650.1 | learning rate: 1.018E-05 | global batch size:    16 | lm loss: 6.588016E+00 | loss scale: 16384.0 | grad norm: 102875.641 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2296/  159576 | consumed samples:        36736 | elapsed time per iteration (ms): 13574.9 | learning rate: 1.018E-05 | global batch size:    16 | lm loss: 6.896825E+00 | loss scale: 16384.0 | grad norm: 70940.128 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2297/  159576 | consumed samples:        36752 | elapsed time per iteration (ms): 13573.3 | learning rate: 1.018E-05 | global batch size:    16 | lm loss: 6.883708E+00 | loss scale: 16384.0 | grad norm: 146744.276 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2298/  159576 | consumed samples:        36768 | elapsed time per iteration (ms): 13649.6 | learning rate: 1.019E-05 | global batch size:    16 | lm loss: 7.139965E+00 | loss scale: 16384.0 | grad norm: 75816.547 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2299/  159576 | consumed samples:        36784 | elapsed time per iteration (ms): 13959.1 | learning rate: 1.019E-05 | global batch size:    16 | lm loss: 6.811082E+00 | loss scale: 16384.0 | grad norm: 83246.485 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2300/  159576 | consumed samples:        36800 | elapsed time per iteration (ms): 13736.9 | learning rate: 1.020E-05 | global batch size:    16 | lm loss: 6.719008E+00 | loss scale: 16384.0 | grad norm: 93595.542 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2301/  159576 | consumed samples:        36816 | elapsed time per iteration (ms): 13666.3 | learning rate: 1.020E-05 | global batch size:    16 | lm loss: 7.039846E+00 | loss scale: 16384.0 | grad norm: 58298.921 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2302/  159576 | consumed samples:        36832 | elapsed time per iteration (ms): 13631.9 | learning rate: 1.021E-05 | global batch size:    16 | lm loss: 6.796918E+00 | loss scale: 16384.0 | grad norm: 153620.255 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2303/  159576 | consumed samples:        36848 | elapsed time per iteration (ms): 13914.1 | learning rate: 1.021E-05 | global batch size:    16 | lm loss: 7.011253E+00 | loss scale: 16384.0 | grad norm: 79116.869 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2304/  159576 | consumed samples:        36864 | elapsed time per iteration (ms): 13578.2 | learning rate: 1.022E-05 | global batch size:    16 | lm loss: 6.786969E+00 | loss scale: 16384.0 | grad norm: 78214.933 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2305/  159576 | consumed samples:        36880 | elapsed time per iteration (ms): 13606.1 | learning rate: 1.022E-05 | global batch size:    16 | lm loss: 6.896228E+00 | loss scale: 16384.0 | grad norm: 59758.026 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2306/  159576 | consumed samples:        36896 | elapsed time per iteration (ms): 13630.5 | learning rate: 1.022E-05 | global batch size:    16 | lm loss: 6.715625E+00 | loss scale: 16384.0 | grad norm: 82018.871 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-24 11:06:17] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1165978_[1-10%1] on 'gpu_p13' partition)
[2021-09-24 11:06:17] PULSE: tr8-104B is running for 5:14:06 since 2021-09-24T05:52:11 (1162855_1 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8])
 iteration     2307/  159576 | consumed samples:        36912 | elapsed time per iteration (ms): 13695.2 | learning rate: 1.023E-05 | global batch size:    16 | lm loss: 6.898945E+00 | loss scale: 16384.0 | grad norm: 69074.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2308/  159576 | consumed samples:        36928 | elapsed time per iteration (ms): 13864.3 | learning rate: 1.023E-05 | global batch size:    16 | lm loss: 6.896221E+00 | loss scale: 16384.0 | grad norm: 86879.176 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2309/  159576 | consumed samples:        36944 | elapsed time per iteration (ms): 13567.7 | learning rate: 1.024E-05 | global batch size:    16 | lm loss: 6.747959E+00 | loss scale: 16384.0 | grad norm: 77379.473 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2310/  159576 | consumed samples:        36960 | elapsed time per iteration (ms): 13717.6 | learning rate: 1.024E-05 | global batch size:    16 | lm loss: 6.945070E+00 | loss scale: 16384.0 | grad norm: 55236.968 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2311/  159576 | consumed samples:        36976 | elapsed time per iteration (ms): 13519.2 | learning rate: 1.025E-05 | global batch size:    16 | lm loss: 7.033360E+00 | loss scale: 16384.0 | grad norm: 184283.626 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2312/  159576 | consumed samples:        36992 | elapsed time per iteration (ms): 14030.2 | learning rate: 1.025E-05 | global batch size:    16 | lm loss: 7.147439E+00 | loss scale: 16384.0 | grad norm: 152407.329 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2313/  159576 | consumed samples:        37008 | elapsed time per iteration (ms): 13685.4 | learning rate: 1.026E-05 | global batch size:    16 | lm loss: 6.739760E+00 | loss scale: 16384.0 | grad norm: 71801.831 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2314/  159576 | consumed samples:        37024 | elapsed time per iteration (ms): 13648.0 | learning rate: 1.026E-05 | global batch size:    16 | lm loss: 6.839672E+00 | loss scale: 16384.0 | grad norm: 112304.747 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2315/  159576 | consumed samples:        37040 | elapsed time per iteration (ms): 13683.0 | learning rate: 1.026E-05 | global batch size:    16 | lm loss: 6.987888E+00 | loss scale: 16384.0 | grad norm: 97383.621 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2316/  159576 | consumed samples:        37056 | elapsed time per iteration (ms): 14019.7 | learning rate: 1.027E-05 | global batch size:    16 | lm loss: 6.766959E+00 | loss scale: 16384.0 | grad norm: 70142.885 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2317/  159576 | consumed samples:        37072 | elapsed time per iteration (ms): 13698.7 | learning rate: 1.027E-05 | global batch size:    16 | lm loss: 7.002495E+00 | loss scale: 16384.0 | grad norm: 94556.236 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2318/  159576 | consumed samples:        37088 | elapsed time per iteration (ms): 13548.8 | learning rate: 1.028E-05 | global batch size:    16 | lm loss: 6.785909E+00 | loss scale: 16384.0 | grad norm: 84852.097 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2319/  159576 | consumed samples:        37104 | elapsed time per iteration (ms): 13558.1 | learning rate: 1.028E-05 | global batch size:    16 | lm loss: 6.969275E+00 | loss scale: 16384.0 | grad norm: 88628.295 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2320/  159576 | consumed samples:        37120 | elapsed time per iteration (ms): 13584.6 | learning rate: 1.029E-05 | global batch size:    16 | lm loss: 6.991512E+00 | loss scale: 16384.0 | grad norm: 73561.859 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2321/  159576 | consumed samples:        37136 | elapsed time per iteration (ms): 13808.4 | learning rate: 1.029E-05 | global batch size:    16 | lm loss: 6.689001E+00 | loss scale: 16384.0 | grad norm: 79235.505 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2322/  159576 | consumed samples:        37152 | elapsed time per iteration (ms): 13660.8 | learning rate: 1.030E-05 | global batch size:    16 | lm loss: 6.829502E+00 | loss scale: 16384.0 | grad norm: 69229.325 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2323/  159576 | consumed samples:        37168 | elapsed time per iteration (ms): 13667.4 | learning rate: 1.030E-05 | global batch size:    16 | lm loss: 6.532575E+00 | loss scale: 16384.0 | grad norm: 55927.225 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2324/  159576 | consumed samples:        37184 | elapsed time per iteration (ms): 13703.5 | learning rate: 1.030E-05 | global batch size:    16 | lm loss: 6.922344E+00 | loss scale: 16384.0 | grad norm: 55395.267 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2325/  159576 | consumed samples:        37200 | elapsed time per iteration (ms): 14028.0 | learning rate: 1.031E-05 | global batch size:    16 | lm loss: 6.827266E+00 | loss scale: 16384.0 | grad norm: 53256.272 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2326/  159576 | consumed samples:        37216 | elapsed time per iteration (ms): 13463.4 | learning rate: 1.031E-05 | global batch size:    16 | lm loss: 6.792019E+00 | loss scale: 16384.0 | grad norm: 61740.952 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2327/  159576 | consumed samples:        37232 | elapsed time per iteration (ms): 13567.6 | learning rate: 1.032E-05 | global batch size:    16 | lm loss: 6.871485E+00 | loss scale: 16384.0 | grad norm: 65916.886 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2328/  159576 | consumed samples:        37248 | elapsed time per iteration (ms): 13610.6 | learning rate: 1.032E-05 | global batch size:    16 | lm loss: 6.773655E+00 | loss scale: 16384.0 | grad norm: 55451.884 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2329/  159576 | consumed samples:        37264 | elapsed time per iteration (ms): 13843.3 | learning rate: 1.033E-05 | global batch size:    16 | lm loss: 6.881806E+00 | loss scale: 16384.0 | grad norm: 68242.844 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2330/  159576 | consumed samples:        37280 | elapsed time per iteration (ms): 13903.0 | learning rate: 1.033E-05 | global batch size:    16 | lm loss: 6.769863E+00 | loss scale: 16384.0 | grad norm: 54395.878 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2331/  159576 | consumed samples:        37296 | elapsed time per iteration (ms): 13689.8 | learning rate: 1.034E-05 | global batch size:    16 | lm loss: 6.915558E+00 | loss scale: 16384.0 | grad norm: 69787.282 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2332/  159576 | consumed samples:        37312 | elapsed time per iteration (ms): 13584.4 | learning rate: 1.034E-05 | global batch size:    16 | lm loss: 6.872691E+00 | loss scale: 16384.0 | grad norm: 53158.222 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2333/  159576 | consumed samples:        37328 | elapsed time per iteration (ms): 13510.8 | learning rate: 1.034E-05 | global batch size:    16 | lm loss: 6.772065E+00 | loss scale: 16384.0 | grad norm: 62866.204 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2334/  159576 | consumed samples:        37344 | elapsed time per iteration (ms): 13981.1 | learning rate: 1.035E-05 | global batch size:    16 | lm loss: 6.889673E+00 | loss scale: 16384.0 | grad norm: 79595.177 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2335/  159576 | consumed samples:        37360 | elapsed time per iteration (ms): 13567.6 | learning rate: 1.035E-05 | global batch size:    16 | lm loss: 6.996318E+00 | loss scale: 16384.0 | grad norm: 47255.254 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2336/  159576 | consumed samples:        37376 | elapsed time per iteration (ms): 13643.5 | learning rate: 1.036E-05 | global batch size:    16 | lm loss: 6.824782E+00 | loss scale: 16384.0 | grad norm: 152401.829 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2337/  159576 | consumed samples:        37392 | elapsed time per iteration (ms): 13630.4 | learning rate: 1.036E-05 | global batch size:    16 | lm loss: 6.711504E+00 | loss scale: 16384.0 | grad norm: 73188.569 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2338/  159576 | consumed samples:        37408 | elapsed time per iteration (ms): 14043.0 | learning rate: 1.037E-05 | global batch size:    16 | lm loss: 6.830018E+00 | loss scale: 16384.0 | grad norm: 92791.023 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2339/  159576 | consumed samples:        37424 | elapsed time per iteration (ms): 13758.4 | learning rate: 1.037E-05 | global batch size:    16 | lm loss: 7.017688E+00 | loss scale: 16384.0 | grad norm: 87062.269 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2340/  159576 | consumed samples:        37440 | elapsed time per iteration (ms): 13518.0 | learning rate: 1.038E-05 | global batch size:    16 | lm loss: 6.749167E+00 | loss scale: 16384.0 | grad norm: 72774.580 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2341/  159576 | consumed samples:        37456 | elapsed time per iteration (ms): 13582.6 | learning rate: 1.038E-05 | global batch size:    16 | lm loss: 7.188419E+00 | loss scale: 16384.0 | grad norm: 400324.609 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2342/  159576 | consumed samples:        37472 | elapsed time per iteration (ms): 13646.9 | learning rate: 1.038E-05 | global batch size:    16 | lm loss: 7.124457E+00 | loss scale: 16384.0 | grad norm: 441674.699 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2343/  159576 | consumed samples:        37488 | elapsed time per iteration (ms): 13721.9 | learning rate: 1.039E-05 | global batch size:    16 | lm loss: 6.941244E+00 | loss scale: 16384.0 | grad norm: 218702.636 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2344/  159576 | consumed samples:        37504 | elapsed time per iteration (ms): 13653.7 | learning rate: 1.039E-05 | global batch size:    16 | lm loss: 6.768173E+00 | loss scale: 16384.0 | grad norm: 93071.046 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2345/  159576 | consumed samples:        37520 | elapsed time per iteration (ms): 13684.4 | learning rate: 1.040E-05 | global batch size:    16 | lm loss: 6.862311E+00 | loss scale: 16384.0 | grad norm: 105985.790 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2346/  159576 | consumed samples:        37536 | elapsed time per iteration (ms): 13732.9 | learning rate: 1.040E-05 | global batch size:    16 | lm loss: 7.097474E+00 | loss scale: 16384.0 | grad norm: 93646.720 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2347/  159576 | consumed samples:        37552 | elapsed time per iteration (ms): 14087.6 | learning rate: 1.041E-05 | global batch size:    16 | lm loss: 6.949347E+00 | loss scale: 16384.0 | grad norm: 169536.748 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2348/  159576 | consumed samples:        37568 | elapsed time per iteration (ms): 13603.2 | learning rate: 1.041E-05 | global batch size:    16 | lm loss: 6.839984E+00 | loss scale: 16384.0 | grad norm: 221068.794 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2349/  159576 | consumed samples:        37584 | elapsed time per iteration (ms): 13602.7 | learning rate: 1.042E-05 | global batch size:    16 | lm loss: 6.722544E+00 | loss scale: 16384.0 | grad norm: 90138.978 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2350/  159576 | consumed samples:        37600 | elapsed time per iteration (ms): 13600.0 | learning rate: 1.042E-05 | global batch size:    16 | lm loss: 6.765959E+00 | loss scale: 16384.0 | grad norm: 87849.268 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2351/  159576 | consumed samples:        37616 | elapsed time per iteration (ms): 14049.9 | learning rate: 1.042E-05 | global batch size:    16 | lm loss: 7.058582E+00 | loss scale: 16384.0 | grad norm: 97203.038 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2352/  159576 | consumed samples:        37632 | elapsed time per iteration (ms): 13664.4 | learning rate: 1.043E-05 | global batch size:    16 | lm loss: 6.709276E+00 | loss scale: 16384.0 | grad norm: 64321.034 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2353/  159576 | consumed samples:        37648 | elapsed time per iteration (ms): 13697.2 | learning rate: 1.043E-05 | global batch size:    16 | lm loss: 6.963477E+00 | loss scale: 16384.0 | grad norm: 219491.874 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2354/  159576 | consumed samples:        37664 | elapsed time per iteration (ms): 13647.8 | learning rate: 1.044E-05 | global batch size:    16 | lm loss: 6.986011E+00 | loss scale: 16384.0 | grad norm: 159710.177 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2355/  159576 | consumed samples:        37680 | elapsed time per iteration (ms): 13594.7 | learning rate: 1.044E-05 | global batch size:    16 | lm loss: 6.833197E+00 | loss scale: 16384.0 | grad norm: 97227.942 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2356/  159576 | consumed samples:        37696 | elapsed time per iteration (ms): 13840.6 | learning rate: 1.045E-05 | global batch size:    16 | lm loss: 7.008437E+00 | loss scale: 16384.0 | grad norm: 89122.852 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2357/  159576 | consumed samples:        37712 | elapsed time per iteration (ms): 13588.8 | learning rate: 1.045E-05 | global batch size:    16 | lm loss: 6.835823E+00 | loss scale: 16384.0 | grad norm: 77947.804 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2358/  159576 | consumed samples:        37728 | elapsed time per iteration (ms): 13642.6 | learning rate: 1.046E-05 | global batch size:    16 | lm loss: 6.735652E+00 | loss scale: 16384.0 | grad norm: 162106.613 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2359/  159576 | consumed samples:        37744 | elapsed time per iteration (ms): 13658.5 | learning rate: 1.046E-05 | global batch size:    16 | lm loss: 6.785017E+00 | loss scale: 16384.0 | grad norm: 128794.072 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2360/  159576 | consumed samples:        37760 | elapsed time per iteration (ms): 14062.2 | learning rate: 1.046E-05 | global batch size:    16 | lm loss: 6.878942E+00 | loss scale: 16384.0 | grad norm: 101269.625 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2361/  159576 | consumed samples:        37776 | elapsed time per iteration (ms): 13561.0 | learning rate: 1.047E-05 | global batch size:    16 | lm loss: 6.893463E+00 | loss scale: 16384.0 | grad norm: 78515.515 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2362/  159576 | consumed samples:        37792 | elapsed time per iteration (ms): 13714.6 | learning rate: 1.047E-05 | global batch size:    16 | lm loss: 6.821845E+00 | loss scale: 16384.0 | grad norm: 78649.404 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2363/  159576 | consumed samples:        37808 | elapsed time per iteration (ms): 13594.5 | learning rate: 1.048E-05 | global batch size:    16 | lm loss: 6.845947E+00 | loss scale: 16384.0 | grad norm: 158409.972 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2364/  159576 | consumed samples:        37824 | elapsed time per iteration (ms): 13648.4 | learning rate: 1.048E-05 | global batch size:    16 | lm loss: 6.840971E+00 | loss scale: 16384.0 | grad norm: 88723.502 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2365/  159576 | consumed samples:        37840 | elapsed time per iteration (ms): 13958.9 | learning rate: 1.049E-05 | global batch size:    16 | lm loss: 6.785653E+00 | loss scale: 16384.0 | grad norm: 106713.788 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2366/  159576 | consumed samples:        37856 | elapsed time per iteration (ms): 13666.9 | learning rate: 1.049E-05 | global batch size:    16 | lm loss: 6.917600E+00 | loss scale: 16384.0 | grad norm: 90335.595 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2367/  159576 | consumed samples:        37872 | elapsed time per iteration (ms): 13690.6 | learning rate: 1.050E-05 | global batch size:    16 | lm loss: 6.840955E+00 | loss scale: 16384.0 | grad norm: 63357.757 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2368/  159576 | consumed samples:        37888 | elapsed time per iteration (ms): 13664.8 | learning rate: 1.050E-05 | global batch size:    16 | lm loss: 6.916069E+00 | loss scale: 16384.0 | grad norm: 107961.857 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2369/  159576 | consumed samples:        37904 | elapsed time per iteration (ms): 14065.2 | learning rate: 1.050E-05 | global batch size:    16 | lm loss: 6.853414E+00 | loss scale: 16384.0 | grad norm: 84442.897 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2370/  159576 | consumed samples:        37920 | elapsed time per iteration (ms): 13656.3 | learning rate: 1.051E-05 | global batch size:    16 | lm loss: 6.827930E+00 | loss scale: 16384.0 | grad norm: 62880.352 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2371/  159576 | consumed samples:        37936 | elapsed time per iteration (ms): 13590.5 | learning rate: 1.051E-05 | global batch size:    16 | lm loss: 6.877656E+00 | loss scale: 16384.0 | grad norm: 75866.540 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2372/  159576 | consumed samples:        37952 | elapsed time per iteration (ms): 13605.0 | learning rate: 1.052E-05 | global batch size:    16 | lm loss: 6.995963E+00 | loss scale: 16384.0 | grad norm: 71192.528 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2373/  159576 | consumed samples:        37968 | elapsed time per iteration (ms): 13951.5 | learning rate: 1.052E-05 | global batch size:    16 | lm loss: 6.794531E+00 | loss scale: 16384.0 | grad norm: 64517.387 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2374/  159576 | consumed samples:        37984 | elapsed time per iteration (ms): 13624.2 | learning rate: 1.053E-05 | global batch size:    16 | lm loss: 6.780855E+00 | loss scale: 16384.0 | grad norm: 83255.646 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2375/  159576 | consumed samples:        38000 | elapsed time per iteration (ms): 13615.3 | learning rate: 1.053E-05 | global batch size:    16 | lm loss: 6.964709E+00 | loss scale: 16384.0 | grad norm: 79867.121 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2376/  159576 | consumed samples:        38016 | elapsed time per iteration (ms): 13718.1 | learning rate: 1.054E-05 | global batch size:    16 | lm loss: 6.657259E+00 | loss scale: 16384.0 | grad norm: 60555.655 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2377/  159576 | consumed samples:        38032 | elapsed time per iteration (ms): 13629.0 | learning rate: 1.054E-05 | global batch size:    16 | lm loss: 6.923594E+00 | loss scale: 16384.0 | grad norm: 52753.203 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2378/  159576 | consumed samples:        38048 | elapsed time per iteration (ms): 13734.6 | learning rate: 1.054E-05 | global batch size:    16 | lm loss: 6.887539E+00 | loss scale: 16384.0 | grad norm: 103430.254 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2379/  159576 | consumed samples:        38064 | elapsed time per iteration (ms): 13608.8 | learning rate: 1.055E-05 | global batch size:    16 | lm loss: 6.627044E+00 | loss scale: 16384.0 | grad norm: 73977.582 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2380/  159576 | consumed samples:        38080 | elapsed time per iteration (ms): 13595.9 | learning rate: 1.055E-05 | global batch size:    16 | lm loss: 6.894679E+00 | loss scale: 16384.0 | grad norm: 66400.111 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2381/  159576 | consumed samples:        38096 | elapsed time per iteration (ms): 13599.7 | learning rate: 1.056E-05 | global batch size:    16 | lm loss: 6.938529E+00 | loss scale: 16384.0 | grad norm: 70512.175 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2382/  159576 | consumed samples:        38112 | elapsed time per iteration (ms): 14135.5 | learning rate: 1.056E-05 | global batch size:    16 | lm loss: 7.303653E+00 | loss scale: 16384.0 | grad norm: 79783.691 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2383/  159576 | consumed samples:        38128 | elapsed time per iteration (ms): 13647.3 | learning rate: 1.057E-05 | global batch size:    16 | lm loss: 6.764983E+00 | loss scale: 16384.0 | grad norm: 74049.858 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2384/  159576 | consumed samples:        38144 | elapsed time per iteration (ms): 13719.9 | learning rate: 1.057E-05 | global batch size:    16 | lm loss: 7.032783E+00 | loss scale: 16384.0 | grad norm: 66855.312 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2385/  159576 | consumed samples:        38160 | elapsed time per iteration (ms): 13573.5 | learning rate: 1.058E-05 | global batch size:    16 | lm loss: 6.839710E+00 | loss scale: 16384.0 | grad norm: 58744.040 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2386/  159576 | consumed samples:        38176 | elapsed time per iteration (ms): 14051.4 | learning rate: 1.058E-05 | global batch size:    16 | lm loss: 6.409803E+00 | loss scale: 16384.0 | grad norm: 54804.059 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2387/  159576 | consumed samples:        38192 | elapsed time per iteration (ms): 13628.8 | learning rate: 1.058E-05 | global batch size:    16 | lm loss: 6.752995E+00 | loss scale: 16384.0 | grad norm: 57078.432 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2388/  159576 | consumed samples:        38208 | elapsed time per iteration (ms): 13611.0 | learning rate: 1.059E-05 | global batch size:    16 | lm loss: 6.738320E+00 | loss scale: 16384.0 | grad norm: 45381.080 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2389/  159576 | consumed samples:        38224 | elapsed time per iteration (ms): 13583.7 | learning rate: 1.059E-05 | global batch size:    16 | lm loss: 6.858883E+00 | loss scale: 16384.0 | grad norm: 86212.464 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2390/  159576 | consumed samples:        38240 | elapsed time per iteration (ms): 13679.8 | learning rate: 1.060E-05 | global batch size:    16 | lm loss: 7.024375E+00 | loss scale: 16384.0 | grad norm: 66322.711 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2391/  159576 | consumed samples:        38256 | elapsed time per iteration (ms): 13997.0 | learning rate: 1.060E-05 | global batch size:    16 | lm loss: 6.983364E+00 | loss scale: 16384.0 | grad norm: 84730.119 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2392/  159576 | consumed samples:        38272 | elapsed time per iteration (ms): 13673.8 | learning rate: 1.061E-05 | global batch size:    16 | lm loss: 6.900928E+00 | loss scale: 16384.0 | grad norm: 52849.295 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2393/  159576 | consumed samples:        38288 | elapsed time per iteration (ms): 13615.2 | learning rate: 1.061E-05 | global batch size:    16 | lm loss: 6.866693E+00 | loss scale: 16384.0 | grad norm: 87208.382 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2394/  159576 | consumed samples:        38304 | elapsed time per iteration (ms): 13615.9 | learning rate: 1.062E-05 | global batch size:    16 | lm loss: 6.702727E+00 | loss scale: 16384.0 | grad norm: 69928.497 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2395/  159576 | consumed samples:        38320 | elapsed time per iteration (ms): 14056.6 | learning rate: 1.062E-05 | global batch size:    16 | lm loss: 6.909261E+00 | loss scale: 16384.0 | grad norm: 122690.959 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2396/  159576 | consumed samples:        38336 | elapsed time per iteration (ms): 13483.1 | learning rate: 1.062E-05 | global batch size:    16 | lm loss: 6.938586E+00 | loss scale: 16384.0 | grad norm: 80283.093 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2397/  159576 | consumed samples:        38352 | elapsed time per iteration (ms): 13678.0 | learning rate: 1.063E-05 | global batch size:    16 | lm loss: 6.916673E+00 | loss scale: 16384.0 | grad norm: 78417.587 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2398/  159576 | consumed samples:        38368 | elapsed time per iteration (ms): 13713.3 | learning rate: 1.063E-05 | global batch size:    16 | lm loss: 6.894761E+00 | loss scale: 16384.0 | grad norm: 79613.154 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2399/  159576 | consumed samples:        38384 | elapsed time per iteration (ms): 13844.0 | learning rate: 1.064E-05 | global batch size:    16 | lm loss: 6.895288E+00 | loss scale: 16384.0 | grad norm: 117360.649 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2400/  159576 | consumed samples:        38400 | elapsed time per iteration (ms): 13869.8 | learning rate: 1.064E-05 | global batch size:    16 | lm loss: 7.002610E+00 | loss scale: 16384.0 | grad norm: 98958.976 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2401/  159576 | consumed samples:        38416 | elapsed time per iteration (ms): 13601.8 | learning rate: 1.065E-05 | global batch size:    16 | lm loss: 6.744779E+00 | loss scale: 16384.0 | grad norm: 75497.856 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2402/  159576 | consumed samples:        38432 | elapsed time per iteration (ms): 13599.2 | learning rate: 1.065E-05 | global batch size:    16 | lm loss: 7.107717E+00 | loss scale: 16384.0 | grad norm: 78343.496 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2403/  159576 | consumed samples:        38448 | elapsed time per iteration (ms): 13623.1 | learning rate: 1.066E-05 | global batch size:    16 | lm loss: 6.897991E+00 | loss scale: 16384.0 | grad norm: 89054.642 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2404/  159576 | consumed samples:        38464 | elapsed time per iteration (ms): 14088.2 | learning rate: 1.066E-05 | global batch size:    16 | lm loss: 6.915084E+00 | loss scale: 16384.0 | grad norm: 88153.392 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2405/  159576 | consumed samples:        38480 | elapsed time per iteration (ms): 13711.7 | learning rate: 1.066E-05 | global batch size:    16 | lm loss: 6.791551E+00 | loss scale: 16384.0 | grad norm: 81047.565 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2406/  159576 | consumed samples:        38496 | elapsed time per iteration (ms): 13659.9 | learning rate: 1.067E-05 | global batch size:    16 | lm loss: 6.768214E+00 | loss scale: 16384.0 | grad norm: 63942.069 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2407/  159576 | consumed samples:        38512 | elapsed time per iteration (ms): 13659.5 | learning rate: 1.067E-05 | global batch size:    16 | lm loss: 6.785830E+00 | loss scale: 16384.0 | grad norm: 50544.281 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2408/  159576 | consumed samples:        38528 | elapsed time per iteration (ms): 14010.2 | learning rate: 1.068E-05 | global batch size:    16 | lm loss: 6.781000E+00 | loss scale: 16384.0 | grad norm: 114170.359 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2409/  159576 | consumed samples:        38544 | elapsed time per iteration (ms): 13587.7 | learning rate: 1.068E-05 | global batch size:    16 | lm loss: 6.876911E+00 | loss scale: 16384.0 | grad norm: 60235.711 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2410/  159576 | consumed samples:        38560 | elapsed time per iteration (ms): 13605.6 | learning rate: 1.069E-05 | global batch size:    16 | lm loss: 6.837091E+00 | loss scale: 16384.0 | grad norm: 72387.988 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2411/  159576 | consumed samples:        38576 | elapsed time per iteration (ms): 13675.7 | learning rate: 1.069E-05 | global batch size:    16 | lm loss: 6.912636E+00 | loss scale: 16384.0 | grad norm: 76432.994 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2412/  159576 | consumed samples:        38592 | elapsed time per iteration (ms): 13569.6 | learning rate: 1.070E-05 | global batch size:    16 | lm loss: 6.712539E+00 | loss scale: 16384.0 | grad norm: 113832.300 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2413/  159576 | consumed samples:        38608 | elapsed time per iteration (ms): 13932.9 | learning rate: 1.070E-05 | global batch size:    16 | lm loss: 6.804219E+00 | loss scale: 16384.0 | grad norm: 73073.257 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2414/  159576 | consumed samples:        38624 | elapsed time per iteration (ms): 13742.1 | learning rate: 1.070E-05 | global batch size:    16 | lm loss: 6.947999E+00 | loss scale: 16384.0 | grad norm: 90599.997 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2415/  159576 | consumed samples:        38640 | elapsed time per iteration (ms): 13556.3 | learning rate: 1.071E-05 | global batch size:    16 | lm loss: 7.002557E+00 | loss scale: 16384.0 | grad norm: 71840.830 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2416/  159576 | consumed samples:        38656 | elapsed time per iteration (ms): 13593.5 | learning rate: 1.071E-05 | global batch size:    16 | lm loss: 6.920745E+00 | loss scale: 16384.0 | grad norm: 60284.538 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2417/  159576 | consumed samples:        38672 | elapsed time per iteration (ms): 14084.6 | learning rate: 1.072E-05 | global batch size:    16 | lm loss: 7.137000E+00 | loss scale: 16384.0 | grad norm: 185539.999 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2418/  159576 | consumed samples:        38688 | elapsed time per iteration (ms): 13641.5 | learning rate: 1.072E-05 | global batch size:    16 | lm loss: 6.757603E+00 | loss scale: 16384.0 | grad norm: 127319.529 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2419/  159576 | consumed samples:        38704 | elapsed time per iteration (ms): 13580.1 | learning rate: 1.073E-05 | global batch size:    16 | lm loss: 6.869411E+00 | loss scale: 16384.0 | grad norm: 97709.249 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2420/  159576 | consumed samples:        38720 | elapsed time per iteration (ms): 13629.2 | learning rate: 1.073E-05 | global batch size:    16 | lm loss: 6.709553E+00 | loss scale: 16384.0 | grad norm: 92144.986 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2421/  159576 | consumed samples:        38736 | elapsed time per iteration (ms): 14151.6 | learning rate: 1.074E-05 | global batch size:    16 | lm loss: 6.884684E+00 | loss scale: 16384.0 | grad norm: 68698.421 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2422/  159576 | consumed samples:        38752 | elapsed time per iteration (ms): 13613.5 | learning rate: 1.074E-05 | global batch size:    16 | lm loss: 6.869916E+00 | loss scale: 16384.0 | grad norm: 183504.116 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2423/  159576 | consumed samples:        38768 | elapsed time per iteration (ms): 13633.7 | learning rate: 1.074E-05 | global batch size:    16 | lm loss: 6.890718E+00 | loss scale: 16384.0 | grad norm: 156548.776 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2424/  159576 | consumed samples:        38784 | elapsed time per iteration (ms): 13607.9 | learning rate: 1.075E-05 | global batch size:    16 | lm loss: 6.935307E+00 | loss scale: 16384.0 | grad norm: 64330.150 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2425/  159576 | consumed samples:        38800 | elapsed time per iteration (ms): 13605.4 | learning rate: 1.075E-05 | global batch size:    16 | lm loss: 6.766086E+00 | loss scale: 16384.0 | grad norm: 69465.082 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2426/  159576 | consumed samples:        38816 | elapsed time per iteration (ms): 13928.6 | learning rate: 1.076E-05 | global batch size:    16 | lm loss: 7.066947E+00 | loss scale: 16384.0 | grad norm: 107634.865 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2427/  159576 | consumed samples:        38832 | elapsed time per iteration (ms): 13650.1 | learning rate: 1.076E-05 | global batch size:    16 | lm loss: 7.050639E+00 | loss scale: 16384.0 | grad norm: 95342.870 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2428/  159576 | consumed samples:        38848 | elapsed time per iteration (ms): 13681.2 | learning rate: 1.077E-05 | global batch size:    16 | lm loss: 6.855616E+00 | loss scale: 16384.0 | grad norm: 59595.304 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2429/  159576 | consumed samples:        38864 | elapsed time per iteration (ms): 13695.9 | learning rate: 1.077E-05 | global batch size:    16 | lm loss: 7.041804E+00 | loss scale: 16384.0 | grad norm: 65131.323 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2430/  159576 | consumed samples:        38880 | elapsed time per iteration (ms): 13962.7 | learning rate: 1.078E-05 | global batch size:    16 | lm loss: 6.803939E+00 | loss scale: 16384.0 | grad norm: 63269.225 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2431/  159576 | consumed samples:        38896 | elapsed time per iteration (ms): 13583.2 | learning rate: 1.078E-05 | global batch size:    16 | lm loss: 6.876345E+00 | loss scale: 16384.0 | grad norm: 74949.342 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2432/  159576 | consumed samples:        38912 | elapsed time per iteration (ms): 13606.6 | learning rate: 1.078E-05 | global batch size:    16 | lm loss: 6.916327E+00 | loss scale: 16384.0 | grad norm: 74586.629 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2433/  159576 | consumed samples:        38928 | elapsed time per iteration (ms): 13607.5 | learning rate: 1.079E-05 | global batch size:    16 | lm loss: 6.779680E+00 | loss scale: 16384.0 | grad norm: 82519.547 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2434/  159576 | consumed samples:        38944 | elapsed time per iteration (ms): 13894.0 | learning rate: 1.079E-05 | global batch size:    16 | lm loss: 6.903611E+00 | loss scale: 16384.0 | grad norm: 69004.549 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2435/  159576 | consumed samples:        38960 | elapsed time per iteration (ms): 13779.1 | learning rate: 1.080E-05 | global batch size:    16 | lm loss: 6.630243E+00 | loss scale: 16384.0 | grad norm: 107197.604 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2436/  159576 | consumed samples:        38976 | elapsed time per iteration (ms): 13659.0 | learning rate: 1.080E-05 | global batch size:    16 | lm loss: 6.876919E+00 | loss scale: 16384.0 | grad norm: 77407.687 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2437/  159576 | consumed samples:        38992 | elapsed time per iteration (ms): 13553.5 | learning rate: 1.081E-05 | global batch size:    16 | lm loss: 6.728307E+00 | loss scale: 16384.0 | grad norm: 79645.236 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2438/  159576 | consumed samples:        39008 | elapsed time per iteration (ms): 13664.0 | learning rate: 1.081E-05 | global batch size:    16 | lm loss: 6.923852E+00 | loss scale: 16384.0 | grad norm: 70221.677 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2439/  159576 | consumed samples:        39024 | elapsed time per iteration (ms): 13814.4 | learning rate: 1.082E-05 | global batch size:    16 | lm loss: 6.729681E+00 | loss scale: 16384.0 | grad norm: 71734.084 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2440/  159576 | consumed samples:        39040 | elapsed time per iteration (ms): 13667.6 | learning rate: 1.082E-05 | global batch size:    16 | lm loss: 6.668837E+00 | loss scale: 16384.0 | grad norm: 69995.202 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2441/  159576 | consumed samples:        39056 | elapsed time per iteration (ms): 13617.8 | learning rate: 1.082E-05 | global batch size:    16 | lm loss: 6.781438E+00 | loss scale: 16384.0 | grad norm: 49304.992 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2442/  159576 | consumed samples:        39072 | elapsed time per iteration (ms): 13652.0 | learning rate: 1.083E-05 | global batch size:    16 | lm loss: 6.810652E+00 | loss scale: 16384.0 | grad norm: 86564.989 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2443/  159576 | consumed samples:        39088 | elapsed time per iteration (ms): 14063.1 | learning rate: 1.083E-05 | global batch size:    16 | lm loss: 6.879047E+00 | loss scale: 16384.0 | grad norm: 56659.131 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2444/  159576 | consumed samples:        39104 | elapsed time per iteration (ms): 13586.9 | learning rate: 1.084E-05 | global batch size:    16 | lm loss: 6.494076E+00 | loss scale: 16384.0 | grad norm: 72585.008 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2445/  159576 | consumed samples:        39120 | elapsed time per iteration (ms): 13676.6 | learning rate: 1.084E-05 | global batch size:    16 | lm loss: 6.713490E+00 | loss scale: 16384.0 | grad norm: 68348.114 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2446/  159576 | consumed samples:        39136 | elapsed time per iteration (ms): 13706.8 | learning rate: 1.085E-05 | global batch size:    16 | lm loss: 6.970970E+00 | loss scale: 16384.0 | grad norm: 145461.809 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2447/  159576 | consumed samples:        39152 | elapsed time per iteration (ms): 13581.7 | learning rate: 1.085E-05 | global batch size:    16 | lm loss: 6.777845E+00 | loss scale: 16384.0 | grad norm: 67935.233 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2448/  159576 | consumed samples:        39168 | elapsed time per iteration (ms): 13810.2 | learning rate: 1.086E-05 | global batch size:    16 | lm loss: 6.772415E+00 | loss scale: 16384.0 | grad norm: 86835.992 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2449/  159576 | consumed samples:        39184 | elapsed time per iteration (ms): 13641.6 | learning rate: 1.086E-05 | global batch size:    16 | lm loss: 6.901608E+00 | loss scale: 16384.0 | grad norm: 86381.928 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2450/  159576 | consumed samples:        39200 | elapsed time per iteration (ms): 13577.4 | learning rate: 1.086E-05 | global batch size:    16 | lm loss: 6.923601E+00 | loss scale: 16384.0 | grad norm: 67065.336 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2451/  159576 | consumed samples:        39216 | elapsed time per iteration (ms): 13656.8 | learning rate: 1.087E-05 | global batch size:    16 | lm loss: 6.635858E+00 | loss scale: 16384.0 | grad norm: 118766.424 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2452/  159576 | consumed samples:        39232 | elapsed time per iteration (ms): 14182.2 | learning rate: 1.087E-05 | global batch size:    16 | lm loss: 6.798747E+00 | loss scale: 16384.0 | grad norm: 86778.590 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2453/  159576 | consumed samples:        39248 | elapsed time per iteration (ms): 13794.7 | learning rate: 1.088E-05 | global batch size:    16 | lm loss: 6.934669E+00 | loss scale: 16384.0 | grad norm: 72867.559 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2454/  159576 | consumed samples:        39264 | elapsed time per iteration (ms): 13649.1 | learning rate: 1.088E-05 | global batch size:    16 | lm loss: 6.689157E+00 | loss scale: 16384.0 | grad norm: 53809.726 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2455/  159576 | consumed samples:        39280 | elapsed time per iteration (ms): 13619.0 | learning rate: 1.089E-05 | global batch size:    16 | lm loss: 6.797565E+00 | loss scale: 16384.0 | grad norm: 130277.119 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2456/  159576 | consumed samples:        39296 | elapsed time per iteration (ms): 14036.7 | learning rate: 1.089E-05 | global batch size:    16 | lm loss: 6.919378E+00 | loss scale: 16384.0 | grad norm: 68731.938 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2457/  159576 | consumed samples:        39312 | elapsed time per iteration (ms): 13656.3 | learning rate: 1.089E-05 | global batch size:    16 | lm loss: 6.658165E+00 | loss scale: 16384.0 | grad norm: 90782.352 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2458/  159576 | consumed samples:        39328 | elapsed time per iteration (ms): 13635.5 | learning rate: 1.090E-05 | global batch size:    16 | lm loss: 6.614546E+00 | loss scale: 16384.0 | grad norm: 80319.945 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2459/  159576 | consumed samples:        39344 | elapsed time per iteration (ms): 13648.3 | learning rate: 1.090E-05 | global batch size:    16 | lm loss: 6.813863E+00 | loss scale: 16384.0 | grad norm: 96291.265 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2460/  159576 | consumed samples:        39360 | elapsed time per iteration (ms): 13655.8 | learning rate: 1.091E-05 | global batch size:    16 | lm loss: 7.162710E+00 | loss scale: 16384.0 | grad norm: 58863.008 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2461/  159576 | consumed samples:        39376 | elapsed time per iteration (ms): 13960.2 | learning rate: 1.091E-05 | global batch size:    16 | lm loss: 6.991768E+00 | loss scale: 16384.0 | grad norm: 72538.165 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2462/  159576 | consumed samples:        39392 | elapsed time per iteration (ms): 13649.7 | learning rate: 1.092E-05 | global batch size:    16 | lm loss: 6.712080E+00 | loss scale: 16384.0 | grad norm: 76061.911 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2463/  159576 | consumed samples:        39408 | elapsed time per iteration (ms): 13665.9 | learning rate: 1.092E-05 | global batch size:    16 | lm loss: 6.697587E+00 | loss scale: 16384.0 | grad norm: 78444.184 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2464/  159576 | consumed samples:        39424 | elapsed time per iteration (ms): 13548.3 | learning rate: 1.093E-05 | global batch size:    16 | lm loss: 6.767040E+00 | loss scale: 16384.0 | grad norm: 71114.390 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2465/  159576 | consumed samples:        39440 | elapsed time per iteration (ms): 13972.6 | learning rate: 1.093E-05 | global batch size:    16 | lm loss: 6.750882E+00 | loss scale: 16384.0 | grad norm: 60498.457 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2466/  159576 | consumed samples:        39456 | elapsed time per iteration (ms): 13657.9 | learning rate: 1.093E-05 | global batch size:    16 | lm loss: 6.631062E+00 | loss scale: 16384.0 | grad norm: 75019.075 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2467/  159576 | consumed samples:        39472 | elapsed time per iteration (ms): 13692.3 | learning rate: 1.094E-05 | global batch size:    16 | lm loss: 6.725332E+00 | loss scale: 16384.0 | grad norm: 53922.114 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2468/  159576 | consumed samples:        39488 | elapsed time per iteration (ms): 13656.1 | learning rate: 1.094E-05 | global batch size:    16 | lm loss: 6.736504E+00 | loss scale: 16384.0 | grad norm: 54250.719 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2469/  159576 | consumed samples:        39504 | elapsed time per iteration (ms): 14009.1 | learning rate: 1.095E-05 | global batch size:    16 | lm loss: 6.881338E+00 | loss scale: 16384.0 | grad norm: 64641.652 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2470/  159576 | consumed samples:        39520 | elapsed time per iteration (ms): 13853.1 | learning rate: 1.095E-05 | global batch size:    16 | lm loss: 6.742140E+00 | loss scale: 16384.0 | grad norm: 52195.808 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2471/  159576 | consumed samples:        39536 | elapsed time per iteration (ms): 13541.2 | learning rate: 1.096E-05 | global batch size:    16 | lm loss: 6.830609E+00 | loss scale: 16384.0 | grad norm: 98883.799 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2472/  159576 | consumed samples:        39552 | elapsed time per iteration (ms): 13618.7 | learning rate: 1.096E-05 | global batch size:    16 | lm loss: 6.770423E+00 | loss scale: 16384.0 | grad norm: 66896.725 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2473/  159576 | consumed samples:        39568 | elapsed time per iteration (ms): 13623.5 | learning rate: 1.097E-05 | global batch size:    16 | lm loss: 6.926878E+00 | loss scale: 16384.0 | grad norm: 74406.160 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2474/  159576 | consumed samples:        39584 | elapsed time per iteration (ms): 14089.9 | learning rate: 1.097E-05 | global batch size:    16 | lm loss: 6.834147E+00 | loss scale: 16384.0 | grad norm: 61442.184 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2475/  159576 | consumed samples:        39600 | elapsed time per iteration (ms): 13713.9 | learning rate: 1.097E-05 | global batch size:    16 | lm loss: 6.711390E+00 | loss scale: 16384.0 | grad norm: 72993.188 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2476/  159576 | consumed samples:        39616 | elapsed time per iteration (ms): 13666.0 | learning rate: 1.098E-05 | global batch size:    16 | lm loss: 6.715760E+00 | loss scale: 16384.0 | grad norm: 54753.919 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2477/  159576 | consumed samples:        39632 | elapsed time per iteration (ms): 13628.3 | learning rate: 1.098E-05 | global batch size:    16 | lm loss: 7.034068E+00 | loss scale: 16384.0 | grad norm: 65362.654 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2478/  159576 | consumed samples:        39648 | elapsed time per iteration (ms): 14016.3 | learning rate: 1.099E-05 | global batch size:    16 | lm loss: 6.848239E+00 | loss scale: 16384.0 | grad norm: 59886.005 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2479/  159576 | consumed samples:        39664 | elapsed time per iteration (ms): 13518.2 | learning rate: 1.099E-05 | global batch size:    16 | lm loss: 6.766425E+00 | loss scale: 32768.0 | grad norm: 47600.323 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2480/  159576 | consumed samples:        39680 | elapsed time per iteration (ms): 13611.4 | learning rate: 1.100E-05 | global batch size:    16 | lm loss: 6.569361E+00 | loss scale: 32768.0 | grad norm: 173183.602 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2481/  159576 | consumed samples:        39696 | elapsed time per iteration (ms): 13649.6 | learning rate: 1.100E-05 | global batch size:    16 | lm loss: 6.977244E+00 | loss scale: 32768.0 | grad norm: 114608.298 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2482/  159576 | consumed samples:        39712 | elapsed time per iteration (ms): 13592.7 | learning rate: 1.101E-05 | global batch size:    16 | lm loss: 6.743002E+00 | loss scale: 32768.0 | grad norm: 157122.447 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2483/  159576 | consumed samples:        39728 | elapsed time per iteration (ms): 13957.3 | learning rate: 1.101E-05 | global batch size:    16 | lm loss: 6.786878E+00 | loss scale: 32768.0 | grad norm: 124608.544 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2484/  159576 | consumed samples:        39744 | elapsed time per iteration (ms): 13654.6 | learning rate: 1.101E-05 | global batch size:    16 | lm loss: 6.859965E+00 | loss scale: 32768.0 | grad norm: 232222.713 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2485/  159576 | consumed samples:        39760 | elapsed time per iteration (ms): 13613.9 | learning rate: 1.102E-05 | global batch size:    16 | lm loss: 6.802356E+00 | loss scale: 32768.0 | grad norm: 156829.946 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2486/  159576 | consumed samples:        39776 | elapsed time per iteration (ms): 13653.4 | learning rate: 1.102E-05 | global batch size:    16 | lm loss: 6.710648E+00 | loss scale: 32768.0 | grad norm: 134523.046 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2487/  159576 | consumed samples:        39792 | elapsed time per iteration (ms): 14072.7 | learning rate: 1.103E-05 | global batch size:    16 | lm loss: 6.797608E+00 | loss scale: 32768.0 | grad norm: 125011.237 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2488/  159576 | consumed samples:        39808 | elapsed time per iteration (ms): 13639.9 | learning rate: 1.103E-05 | global batch size:    16 | lm loss: 6.854223E+00 | loss scale: 32768.0 | grad norm: 260551.098 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2489/  159576 | consumed samples:        39824 | elapsed time per iteration (ms): 13577.6 | learning rate: 1.104E-05 | global batch size:    16 | lm loss: 6.603992E+00 | loss scale: 32768.0 | grad norm: 181893.334 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2490/  159576 | consumed samples:        39840 | elapsed time per iteration (ms): 13675.7 | learning rate: 1.104E-05 | global batch size:    16 | lm loss: 6.694830E+00 | loss scale: 32768.0 | grad norm: 141757.675 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2491/  159576 | consumed samples:        39856 | elapsed time per iteration (ms): 14083.9 | learning rate: 1.105E-05 | global batch size:    16 | lm loss: 6.642892E+00 | loss scale: 32768.0 | grad norm: 119287.049 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2492/  159576 | consumed samples:        39872 | elapsed time per iteration (ms): 13603.6 | learning rate: 1.105E-05 | global batch size:    16 | lm loss: 6.801910E+00 | loss scale: 32768.0 | grad norm: 155539.404 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2493/  159576 | consumed samples:        39888 | elapsed time per iteration (ms): 13598.7 | learning rate: 1.105E-05 | global batch size:    16 | lm loss: 6.791874E+00 | loss scale: 32768.0 | grad norm: 122407.998 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2494/  159576 | consumed samples:        39904 | elapsed time per iteration (ms): 13643.8 | learning rate: 1.106E-05 | global batch size:    16 | lm loss: 6.826643E+00 | loss scale: 32768.0 | grad norm: 128586.240 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2495/  159576 | consumed samples:        39920 | elapsed time per iteration (ms): 13584.0 | learning rate: 1.106E-05 | global batch size:    16 | lm loss: 6.715306E+00 | loss scale: 32768.0 | grad norm: 99484.803 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2496/  159576 | consumed samples:        39936 | elapsed time per iteration (ms): 13754.1 | learning rate: 1.107E-05 | global batch size:    16 | lm loss: 6.833625E+00 | loss scale: 32768.0 | grad norm: 115202.668 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2497/  159576 | consumed samples:        39952 | elapsed time per iteration (ms): 13634.3 | learning rate: 1.107E-05 | global batch size:    16 | lm loss: 6.915625E+00 | loss scale: 32768.0 | grad norm: 186838.919 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2498/  159576 | consumed samples:        39968 | elapsed time per iteration (ms): 13644.0 | learning rate: 1.108E-05 | global batch size:    16 | lm loss: 6.967087E+00 | loss scale: 32768.0 | grad norm: 131122.134 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2499/  159576 | consumed samples:        39984 | elapsed time per iteration (ms): 13681.7 | learning rate: 1.108E-05 | global batch size:    16 | lm loss: 6.760918E+00 | loss scale: 32768.0 | grad norm: 194624.256 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2500/  159576 | consumed samples:        40000 | elapsed time per iteration (ms): 14007.6 | learning rate: 1.109E-05 | global batch size:    16 | lm loss: 6.979738E+00 | loss scale: 32768.0 | grad norm: 156689.771 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2501/  159576 | consumed samples:        40016 | elapsed time per iteration (ms): 13617.5 | learning rate: 1.109E-05 | global batch size:    16 | lm loss: 6.789479E+00 | loss scale: 32768.0 | grad norm: 144780.709 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2502/  159576 | consumed samples:        40032 | elapsed time per iteration (ms): 13599.5 | learning rate: 1.109E-05 | global batch size:    16 | lm loss: 6.864005E+00 | loss scale: 32768.0 | grad norm: 170229.489 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2503/  159576 | consumed samples:        40048 | elapsed time per iteration (ms): 13573.2 | learning rate: 1.110E-05 | global batch size:    16 | lm loss: 6.666573E+00 | loss scale: 32768.0 | grad norm: 146264.627 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2504/  159576 | consumed samples:        40064 | elapsed time per iteration (ms): 13981.7 | learning rate: 1.110E-05 | global batch size:    16 | lm loss: 6.757555E+00 | loss scale: 32768.0 | grad norm: 194432.846 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2505/  159576 | consumed samples:        40080 | elapsed time per iteration (ms): 13815.5 | learning rate: 1.111E-05 | global batch size:    16 | lm loss: 7.060199E+00 | loss scale: 32768.0 | grad norm: 107664.354 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2506/  159576 | consumed samples:        40096 | elapsed time per iteration (ms): 13708.3 | learning rate: 1.111E-05 | global batch size:    16 | lm loss: 6.757818E+00 | loss scale: 32768.0 | grad norm: 172391.067 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2507/  159576 | consumed samples:        40112 | elapsed time per iteration (ms): 13682.1 | learning rate: 1.112E-05 | global batch size:    16 | lm loss: 6.957751E+00 | loss scale: 32768.0 | grad norm: 153732.331 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2508/  159576 | consumed samples:        40128 | elapsed time per iteration (ms): 13651.8 | learning rate: 1.112E-05 | global batch size:    16 | lm loss: 6.697278E+00 | loss scale: 32768.0 | grad norm: 269873.049 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2509/  159576 | consumed samples:        40144 | elapsed time per iteration (ms): 13847.8 | learning rate: 1.113E-05 | global batch size:    16 | lm loss: 6.915687E+00 | loss scale: 32768.0 | grad norm: 203672.027 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2510/  159576 | consumed samples:        40160 | elapsed time per iteration (ms): 13726.7 | learning rate: 1.113E-05 | global batch size:    16 | lm loss: 6.563999E+00 | loss scale: 32768.0 | grad norm: 156793.595 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2511/  159576 | consumed samples:        40176 | elapsed time per iteration (ms): 13592.8 | learning rate: 1.113E-05 | global batch size:    16 | lm loss: 6.816392E+00 | loss scale: 32768.0 | grad norm: 174319.403 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2512/  159576 | consumed samples:        40192 | elapsed time per iteration (ms): 13663.1 | learning rate: 1.114E-05 | global batch size:    16 | lm loss: 6.610006E+00 | loss scale: 32768.0 | grad norm: 205941.600 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2513/  159576 | consumed samples:        40208 | elapsed time per iteration (ms): 13997.4 | learning rate: 1.114E-05 | global batch size:    16 | lm loss: 6.968318E+00 | loss scale: 32768.0 | grad norm: 198426.978 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2514/  159576 | consumed samples:        40224 | elapsed time per iteration (ms): 13639.5 | learning rate: 1.115E-05 | global batch size:    16 | lm loss: 6.754237E+00 | loss scale: 32768.0 | grad norm: 150994.686 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2515/  159576 | consumed samples:        40240 | elapsed time per iteration (ms): 13721.6 | learning rate: 1.115E-05 | global batch size:    16 | lm loss: 6.780080E+00 | loss scale: 32768.0 | grad norm: 221933.544 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2516/  159576 | consumed samples:        40256 | elapsed time per iteration (ms): 13588.8 | learning rate: 1.116E-05 | global batch size:    16 | lm loss: 7.005465E+00 | loss scale: 32768.0 | grad norm: 111981.898 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2517/  159576 | consumed samples:        40272 | elapsed time per iteration (ms): 13636.9 | learning rate: 1.116E-05 | global batch size:    16 | lm loss: 7.038844E+00 | loss scale: 32768.0 | grad norm: 207331.802 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2518/  159576 | consumed samples:        40288 | elapsed time per iteration (ms): 13872.4 | learning rate: 1.117E-05 | global batch size:    16 | lm loss: 6.753989E+00 | loss scale: 32768.0 | grad norm: 152725.941 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2519/  159576 | consumed samples:        40304 | elapsed time per iteration (ms): 13607.9 | learning rate: 1.117E-05 | global batch size:    16 | lm loss: 6.981558E+00 | loss scale: 32768.0 | grad norm: 154949.465 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2520/  159576 | consumed samples:        40320 | elapsed time per iteration (ms): 13684.9 | learning rate: 1.117E-05 | global batch size:    16 | lm loss: 6.906241E+00 | loss scale: 32768.0 | grad norm: 125549.575 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2521/  159576 | consumed samples:        40336 | elapsed time per iteration (ms): 13716.2 | learning rate: 1.118E-05 | global batch size:    16 | lm loss: 6.747027E+00 | loss scale: 32768.0 | grad norm: 122780.845 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2522/  159576 | consumed samples:        40352 | elapsed time per iteration (ms): 14167.1 | learning rate: 1.118E-05 | global batch size:    16 | lm loss: 6.970352E+00 | loss scale: 32768.0 | grad norm: 118819.513 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2523/  159576 | consumed samples:        40368 | elapsed time per iteration (ms): 13664.4 | learning rate: 1.119E-05 | global batch size:    16 | lm loss: 6.714174E+00 | loss scale: 32768.0 | grad norm: 146027.986 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2524/  159576 | consumed samples:        40384 | elapsed time per iteration (ms): 13630.7 | learning rate: 1.119E-05 | global batch size:    16 | lm loss: 6.610335E+00 | loss scale: 32768.0 | grad norm: 242081.240 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2525/  159576 | consumed samples:        40400 | elapsed time per iteration (ms): 13685.5 | learning rate: 1.120E-05 | global batch size:    16 | lm loss: 6.889633E+00 | loss scale: 32768.0 | grad norm: 125371.781 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2526/  159576 | consumed samples:        40416 | elapsed time per iteration (ms): 13989.6 | learning rate: 1.120E-05 | global batch size:    16 | lm loss: 6.703308E+00 | loss scale: 32768.0 | grad norm: 229244.600 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2527/  159576 | consumed samples:        40432 | elapsed time per iteration (ms): 13653.7 | learning rate: 1.121E-05 | global batch size:    16 | lm loss: 6.903625E+00 | loss scale: 32768.0 | grad norm: 180615.201 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2528/  159576 | consumed samples:        40448 | elapsed time per iteration (ms): 13688.8 | learning rate: 1.121E-05 | global batch size:    16 | lm loss: 6.882591E+00 | loss scale: 32768.0 | grad norm: 123446.214 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2529/  159576 | consumed samples:        40464 | elapsed time per iteration (ms): 13727.9 | learning rate: 1.121E-05 | global batch size:    16 | lm loss: 6.771068E+00 | loss scale: 32768.0 | grad norm: 136122.381 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2530/  159576 | consumed samples:        40480 | elapsed time per iteration (ms): 13727.3 | learning rate: 1.122E-05 | global batch size:    16 | lm loss: 6.839997E+00 | loss scale: 32768.0 | grad norm: 198759.749 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2531/  159576 | consumed samples:        40496 | elapsed time per iteration (ms): 13882.2 | learning rate: 1.122E-05 | global batch size:    16 | lm loss: 6.934726E+00 | loss scale: 32768.0 | grad norm: 140393.181 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2532/  159576 | consumed samples:        40512 | elapsed time per iteration (ms): 13707.7 | learning rate: 1.123E-05 | global batch size:    16 | lm loss: 6.824786E+00 | loss scale: 32768.0 | grad norm: 136497.509 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2533/  159576 | consumed samples:        40528 | elapsed time per iteration (ms): 13668.7 | learning rate: 1.123E-05 | global batch size:    16 | lm loss: 6.638996E+00 | loss scale: 32768.0 | grad norm: 108086.442 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2534/  159576 | consumed samples:        40544 | elapsed time per iteration (ms): 13600.7 | learning rate: 1.124E-05 | global batch size:    16 | lm loss: 6.684957E+00 | loss scale: 32768.0 | grad norm: 136205.291 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2535/  159576 | consumed samples:        40560 | elapsed time per iteration (ms): 14008.2 | learning rate: 1.124E-05 | global batch size:    16 | lm loss: 6.650595E+00 | loss scale: 32768.0 | grad norm: 89458.356 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2536/  159576 | consumed samples:        40576 | elapsed time per iteration (ms): 13696.2 | learning rate: 1.125E-05 | global batch size:    16 | lm loss: 6.720654E+00 | loss scale: 32768.0 | grad norm: 207949.897 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2537/  159576 | consumed samples:        40592 | elapsed time per iteration (ms): 13728.0 | learning rate: 1.125E-05 | global batch size:    16 | lm loss: 6.934484E+00 | loss scale: 32768.0 | grad norm: 145165.262 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2538/  159576 | consumed samples:        40608 | elapsed time per iteration (ms): 13707.3 | learning rate: 1.125E-05 | global batch size:    16 | lm loss: 6.659933E+00 | loss scale: 32768.0 | grad norm: 109227.116 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2539/  159576 | consumed samples:        40624 | elapsed time per iteration (ms): 14115.0 | learning rate: 1.126E-05 | global batch size:    16 | lm loss: 6.638377E+00 | loss scale: 32768.0 | grad norm: 221623.574 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2540/  159576 | consumed samples:        40640 | elapsed time per iteration (ms): 13557.7 | learning rate: 1.126E-05 | global batch size:    16 | lm loss: 6.825821E+00 | loss scale: 32768.0 | grad norm: 114656.887 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2541/  159576 | consumed samples:        40656 | elapsed time per iteration (ms): 13635.6 | learning rate: 1.127E-05 | global batch size:    16 | lm loss: 6.869952E+00 | loss scale: 32768.0 | grad norm: 204975.764 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2542/  159576 | consumed samples:        40672 | elapsed time per iteration (ms): 13682.2 | learning rate: 1.127E-05 | global batch size:    16 | lm loss: 6.829473E+00 | loss scale: 32768.0 | grad norm: 158875.582 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2543/  159576 | consumed samples:        40688 | elapsed time per iteration (ms): 13675.9 | learning rate: 1.128E-05 | global batch size:    16 | lm loss: 6.921135E+00 | loss scale: 32768.0 | grad norm: 248424.787 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2544/  159576 | consumed samples:        40704 | elapsed time per iteration (ms): 14035.2 | learning rate: 1.128E-05 | global batch size:    16 | lm loss: 6.734321E+00 | loss scale: 32768.0 | grad norm: 137358.902 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2545/  159576 | consumed samples:        40720 | elapsed time per iteration (ms): 13685.4 | learning rate: 1.129E-05 | global batch size:    16 | lm loss: 6.824071E+00 | loss scale: 32768.0 | grad norm: 172473.743 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2546/  159576 | consumed samples:        40736 | elapsed time per iteration (ms): 13704.2 | learning rate: 1.129E-05 | global batch size:    16 | lm loss: 6.741428E+00 | loss scale: 32768.0 | grad norm: 117821.390 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2547/  159576 | consumed samples:        40752 | elapsed time per iteration (ms): 13625.1 | learning rate: 1.129E-05 | global batch size:    16 | lm loss: 6.825446E+00 | loss scale: 32768.0 | grad norm: 302813.390 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2548/  159576 | consumed samples:        40768 | elapsed time per iteration (ms): 13978.9 | learning rate: 1.130E-05 | global batch size:    16 | lm loss: 6.930991E+00 | loss scale: 32768.0 | grad norm: 163222.779 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2549/  159576 | consumed samples:        40784 | elapsed time per iteration (ms): 13605.2 | learning rate: 1.130E-05 | global batch size:    16 | lm loss: 6.901045E+00 | loss scale: 32768.0 | grad norm: 178776.030 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2550/  159576 | consumed samples:        40800 | elapsed time per iteration (ms): 13704.5 | learning rate: 1.131E-05 | global batch size:    16 | lm loss: 6.923467E+00 | loss scale: 32768.0 | grad norm: 156500.588 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2551/  159576 | consumed samples:        40816 | elapsed time per iteration (ms): 13642.0 | learning rate: 1.131E-05 | global batch size:    16 | lm loss: 6.698053E+00 | loss scale: 32768.0 | grad norm: 142885.953 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2552/  159576 | consumed samples:        40832 | elapsed time per iteration (ms): 13988.3 | learning rate: 1.132E-05 | global batch size:    16 | lm loss: 6.774540E+00 | loss scale: 32768.0 | grad norm: 236886.022 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2553/  159576 | consumed samples:        40848 | elapsed time per iteration (ms): 13862.8 | learning rate: 1.132E-05 | global batch size:    16 | lm loss: 6.706432E+00 | loss scale: 32768.0 | grad norm: 178546.693 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2554/  159576 | consumed samples:        40864 | elapsed time per iteration (ms): 13629.3 | learning rate: 1.133E-05 | global batch size:    16 | lm loss: 6.631795E+00 | loss scale: 32768.0 | grad norm: 176739.826 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2555/  159576 | consumed samples:        40880 | elapsed time per iteration (ms): 13608.3 | learning rate: 1.133E-05 | global batch size:    16 | lm loss: 7.180985E+00 | loss scale: 32768.0 | grad norm: 132584.462 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2556/  159576 | consumed samples:        40896 | elapsed time per iteration (ms): 13580.0 | learning rate: 1.133E-05 | global batch size:    16 | lm loss: 6.838911E+00 | loss scale: 32768.0 | grad norm: 90158.811 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2557/  159576 | consumed samples:        40912 | elapsed time per iteration (ms): 13942.7 | learning rate: 1.134E-05 | global batch size:    16 | lm loss: 6.693833E+00 | loss scale: 32768.0 | grad norm: 220674.059 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2558/  159576 | consumed samples:        40928 | elapsed time per iteration (ms): 13802.7 | learning rate: 1.134E-05 | global batch size:    16 | lm loss: 6.568502E+00 | loss scale: 32768.0 | grad norm: 98298.873 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2559/  159576 | consumed samples:        40944 | elapsed time per iteration (ms): 13641.3 | learning rate: 1.135E-05 | global batch size:    16 | lm loss: 6.635581E+00 | loss scale: 32768.0 | grad norm: 169974.305 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2560/  159576 | consumed samples:        40960 | elapsed time per iteration (ms): 13704.3 | learning rate: 1.135E-05 | global batch size:    16 | lm loss: 6.565581E+00 | loss scale: 32768.0 | grad norm: 129387.649 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2561/  159576 | consumed samples:        40976 | elapsed time per iteration (ms): 14001.7 | learning rate: 1.136E-05 | global batch size:    16 | lm loss: 6.892058E+00 | loss scale: 32768.0 | grad norm: 339367.225 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2562/  159576 | consumed samples:        40992 | elapsed time per iteration (ms): 13513.6 | learning rate: 1.136E-05 | global batch size:    16 | lm loss: 6.762362E+00 | loss scale: 32768.0 | grad norm: 232794.254 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2563/  159576 | consumed samples:        41008 | elapsed time per iteration (ms): 13601.0 | learning rate: 1.137E-05 | global batch size:    16 | lm loss: 6.843441E+00 | loss scale: 32768.0 | grad norm: 163664.983 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2564/  159576 | consumed samples:        41024 | elapsed time per iteration (ms): 13594.8 | learning rate: 1.137E-05 | global batch size:    16 | lm loss: 6.819015E+00 | loss scale: 32768.0 | grad norm: 216339.488 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2565/  159576 | consumed samples:        41040 | elapsed time per iteration (ms): 13605.6 | learning rate: 1.137E-05 | global batch size:    16 | lm loss: 6.897832E+00 | loss scale: 32768.0 | grad norm: 109607.502 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2566/  159576 | consumed samples:        41056 | elapsed time per iteration (ms): 13861.7 | learning rate: 1.138E-05 | global batch size:    16 | lm loss: 6.875384E+00 | loss scale: 32768.0 | grad norm: 402667.413 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2567/  159576 | consumed samples:        41072 | elapsed time per iteration (ms): 13673.7 | learning rate: 1.138E-05 | global batch size:    16 | lm loss: 6.761830E+00 | loss scale: 32768.0 | grad norm: 270664.955 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2568/  159576 | consumed samples:        41088 | elapsed time per iteration (ms): 13672.1 | learning rate: 1.139E-05 | global batch size:    16 | lm loss: 6.792128E+00 | loss scale: 32768.0 | grad norm: 176693.891 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2569/  159576 | consumed samples:        41104 | elapsed time per iteration (ms): 13609.9 | learning rate: 1.139E-05 | global batch size:    16 | lm loss: 6.935215E+00 | loss scale: 32768.0 | grad norm: 182679.180 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-24 12:06:15] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1165978_[1-10%1] on 'gpu_p13' partition)
[2021-09-24 12:06:15] PULSE: tr8-104B is running for 6:14:04 since 2021-09-24T05:52:11 (1162855_1 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8])
 iteration     2570/  159576 | consumed samples:        41120 | elapsed time per iteration (ms): 14149.7 | learning rate: 1.140E-05 | global batch size:    16 | lm loss: 6.826759E+00 | loss scale: 32768.0 | grad norm: 135711.486 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2571/  159576 | consumed samples:        41136 | elapsed time per iteration (ms): 13749.2 | learning rate: 1.140E-05 | global batch size:    16 | lm loss: 6.600703E+00 | loss scale: 32768.0 | grad norm: 143461.893 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2572/  159576 | consumed samples:        41152 | elapsed time per iteration (ms): 13601.5 | learning rate: 1.141E-05 | global batch size:    16 | lm loss: 6.747102E+00 | loss scale: 32768.0 | grad norm: 205480.052 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2573/  159576 | consumed samples:        41168 | elapsed time per iteration (ms): 13680.7 | learning rate: 1.141E-05 | global batch size:    16 | lm loss: 6.767237E+00 | loss scale: 32768.0 | grad norm: 186807.581 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2574/  159576 | consumed samples:        41184 | elapsed time per iteration (ms): 14103.7 | learning rate: 1.141E-05 | global batch size:    16 | lm loss: 6.786840E+00 | loss scale: 32768.0 | grad norm: 125986.096 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2575/  159576 | consumed samples:        41200 | elapsed time per iteration (ms): 13634.6 | learning rate: 1.142E-05 | global batch size:    16 | lm loss: 6.740016E+00 | loss scale: 32768.0 | grad norm: 127578.945 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2576/  159576 | consumed samples:        41216 | elapsed time per iteration (ms): 13632.4 | learning rate: 1.142E-05 | global batch size:    16 | lm loss: 6.717787E+00 | loss scale: 32768.0 | grad norm: 91352.288 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2577/  159576 | consumed samples:        41232 | elapsed time per iteration (ms): 13613.7 | learning rate: 1.143E-05 | global batch size:    16 | lm loss: 6.736307E+00 | loss scale: 32768.0 | grad norm: 161126.891 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2578/  159576 | consumed samples:        41248 | elapsed time per iteration (ms): 13501.7 | learning rate: 1.143E-05 | global batch size:    16 | lm loss: 6.725785E+00 | loss scale: 32768.0 | grad norm: 105065.485 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2579/  159576 | consumed samples:        41264 | elapsed time per iteration (ms): 13746.0 | learning rate: 1.144E-05 | global batch size:    16 | lm loss: 6.731723E+00 | loss scale: 32768.0 | grad norm: 123413.248 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2580/  159576 | consumed samples:        41280 | elapsed time per iteration (ms): 13621.8 | learning rate: 1.144E-05 | global batch size:    16 | lm loss: 6.889888E+00 | loss scale: 32768.0 | grad norm: 128934.561 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2581/  159576 | consumed samples:        41296 | elapsed time per iteration (ms): 13634.3 | learning rate: 1.145E-05 | global batch size:    16 | lm loss: 6.845993E+00 | loss scale: 32768.0 | grad norm: 140353.622 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2582/  159576 | consumed samples:        41312 | elapsed time per iteration (ms): 13645.1 | learning rate: 1.145E-05 | global batch size:    16 | lm loss: 6.922751E+00 | loss scale: 32768.0 | grad norm: 193649.510 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2583/  159576 | consumed samples:        41328 | elapsed time per iteration (ms): 14012.6 | learning rate: 1.145E-05 | global batch size:    16 | lm loss: 6.706060E+00 | loss scale: 32768.0 | grad norm: 120536.730 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2584/  159576 | consumed samples:        41344 | elapsed time per iteration (ms): 13567.7 | learning rate: 1.146E-05 | global batch size:    16 | lm loss: 6.729124E+00 | loss scale: 32768.0 | grad norm: 150036.593 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2585/  159576 | consumed samples:        41360 | elapsed time per iteration (ms): 13534.2 | learning rate: 1.146E-05 | global batch size:    16 | lm loss: 6.841982E+00 | loss scale: 32768.0 | grad norm: 169788.083 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2586/  159576 | consumed samples:        41376 | elapsed time per iteration (ms): 13556.0 | learning rate: 1.147E-05 | global batch size:    16 | lm loss: 6.813578E+00 | loss scale: 32768.0 | grad norm: 120615.854 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2587/  159576 | consumed samples:        41392 | elapsed time per iteration (ms): 13668.2 | learning rate: 1.147E-05 | global batch size:    16 | lm loss: 6.675393E+00 | loss scale: 32768.0 | grad norm: 202372.780 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2588/  159576 | consumed samples:        41408 | elapsed time per iteration (ms): 13867.2 | learning rate: 1.148E-05 | global batch size:    16 | lm loss: 6.796386E+00 | loss scale: 32768.0 | grad norm: 131901.199 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2589/  159576 | consumed samples:        41424 | elapsed time per iteration (ms): 13636.7 | learning rate: 1.148E-05 | global batch size:    16 | lm loss: 6.783171E+00 | loss scale: 32768.0 | grad norm: 127655.447 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2590/  159576 | consumed samples:        41440 | elapsed time per iteration (ms): 13677.9 | learning rate: 1.149E-05 | global batch size:    16 | lm loss: 6.672108E+00 | loss scale: 32768.0 | grad norm: 111803.439 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2591/  159576 | consumed samples:        41456 | elapsed time per iteration (ms): 13670.0 | learning rate: 1.149E-05 | global batch size:    16 | lm loss: 6.894643E+00 | loss scale: 32768.0 | grad norm: 156503.152 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2592/  159576 | consumed samples:        41472 | elapsed time per iteration (ms): 14137.5 | learning rate: 1.149E-05 | global batch size:    16 | lm loss: 6.765024E+00 | loss scale: 32768.0 | grad norm: 160594.152 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2593/  159576 | consumed samples:        41488 | elapsed time per iteration (ms): 13635.7 | learning rate: 1.150E-05 | global batch size:    16 | lm loss: 6.882227E+00 | loss scale: 32768.0 | grad norm: 142008.845 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2594/  159576 | consumed samples:        41504 | elapsed time per iteration (ms): 13592.8 | learning rate: 1.150E-05 | global batch size:    16 | lm loss: 6.750668E+00 | loss scale: 32768.0 | grad norm: 137376.665 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2595/  159576 | consumed samples:        41520 | elapsed time per iteration (ms): 13572.7 | learning rate: 1.151E-05 | global batch size:    16 | lm loss: 6.870511E+00 | loss scale: 32768.0 | grad norm: 203139.065 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2596/  159576 | consumed samples:        41536 | elapsed time per iteration (ms): 13955.3 | learning rate: 1.151E-05 | global batch size:    16 | lm loss: 6.952578E+00 | loss scale: 32768.0 | grad norm: 259660.982 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2597/  159576 | consumed samples:        41552 | elapsed time per iteration (ms): 13711.6 | learning rate: 1.152E-05 | global batch size:    16 | lm loss: 6.681178E+00 | loss scale: 32768.0 | grad norm: 126907.178 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2598/  159576 | consumed samples:        41568 | elapsed time per iteration (ms): 13707.8 | learning rate: 1.152E-05 | global batch size:    16 | lm loss: 6.610268E+00 | loss scale: 32768.0 | grad norm: 135897.348 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2599/  159576 | consumed samples:        41584 | elapsed time per iteration (ms): 13564.4 | learning rate: 1.153E-05 | global batch size:    16 | lm loss: 6.826151E+00 | loss scale: 32768.0 | grad norm: 155911.584 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2600/  159576 | consumed samples:        41600 | elapsed time per iteration (ms): 13546.1 | learning rate: 1.153E-05 | global batch size:    16 | lm loss: 6.632576E+00 | loss scale: 32768.0 | grad norm: 252409.904 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2601/  159576 | consumed samples:        41616 | elapsed time per iteration (ms): 13887.8 | learning rate: 1.153E-05 | global batch size:    16 | lm loss: 6.631788E+00 | loss scale: 32768.0 | grad norm: 165940.601 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2602/  159576 | consumed samples:        41632 | elapsed time per iteration (ms): 13567.8 | learning rate: 1.154E-05 | global batch size:    16 | lm loss: 6.939396E+00 | loss scale: 32768.0 | grad norm: 124805.953 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2603/  159576 | consumed samples:        41648 | elapsed time per iteration (ms): 13581.4 | learning rate: 1.154E-05 | global batch size:    16 | lm loss: 6.924129E+00 | loss scale: 32768.0 | grad norm: 133938.726 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2604/  159576 | consumed samples:        41664 | elapsed time per iteration (ms): 13613.2 | learning rate: 1.155E-05 | global batch size:    16 | lm loss: 6.660190E+00 | loss scale: 32768.0 | grad norm: 188689.396 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2605/  159576 | consumed samples:        41680 | elapsed time per iteration (ms): 14144.8 | learning rate: 1.155E-05 | global batch size:    16 | lm loss: 6.643148E+00 | loss scale: 32768.0 | grad norm: 123140.550 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2606/  159576 | consumed samples:        41696 | elapsed time per iteration (ms): 13667.3 | learning rate: 1.156E-05 | global batch size:    16 | lm loss: 6.805959E+00 | loss scale: 32768.0 | grad norm: 196566.691 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2607/  159576 | consumed samples:        41712 | elapsed time per iteration (ms): 13574.2 | learning rate: 1.156E-05 | global batch size:    16 | lm loss: 6.711599E+00 | loss scale: 32768.0 | grad norm: 167578.316 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2608/  159576 | consumed samples:        41728 | elapsed time per iteration (ms): 13571.4 | learning rate: 1.157E-05 | global batch size:    16 | lm loss: 6.852364E+00 | loss scale: 32768.0 | grad norm: 120545.344 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2609/  159576 | consumed samples:        41744 | elapsed time per iteration (ms): 13823.4 | learning rate: 1.157E-05 | global batch size:    16 | lm loss: 6.988579E+00 | loss scale: 32768.0 | grad norm: 242130.577 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2610/  159576 | consumed samples:        41760 | elapsed time per iteration (ms): 13677.8 | learning rate: 1.157E-05 | global batch size:    16 | lm loss: 6.640975E+00 | loss scale: 32768.0 | grad norm: 193270.029 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2611/  159576 | consumed samples:        41776 | elapsed time per iteration (ms): 13648.9 | learning rate: 1.158E-05 | global batch size:    16 | lm loss: 6.554218E+00 | loss scale: 32768.0 | grad norm: 132307.655 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2612/  159576 | consumed samples:        41792 | elapsed time per iteration (ms): 13675.5 | learning rate: 1.158E-05 | global batch size:    16 | lm loss: 6.875402E+00 | loss scale: 32768.0 | grad norm: 127017.802 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2613/  159576 | consumed samples:        41808 | elapsed time per iteration (ms): 13589.6 | learning rate: 1.159E-05 | global batch size:    16 | lm loss: 6.853450E+00 | loss scale: 32768.0 | grad norm: 271835.942 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2614/  159576 | consumed samples:        41824 | elapsed time per iteration (ms): 13981.2 | learning rate: 1.159E-05 | global batch size:    16 | lm loss: 6.810247E+00 | loss scale: 32768.0 | grad norm: 210644.569 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2615/  159576 | consumed samples:        41840 | elapsed time per iteration (ms): 13580.3 | learning rate: 1.160E-05 | global batch size:    16 | lm loss: 6.856892E+00 | loss scale: 32768.0 | grad norm: 139996.135 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2616/  159576 | consumed samples:        41856 | elapsed time per iteration (ms): 13592.7 | learning rate: 1.160E-05 | global batch size:    16 | lm loss: 6.687234E+00 | loss scale: 32768.0 | grad norm: 130216.414 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2617/  159576 | consumed samples:        41872 | elapsed time per iteration (ms): 13579.5 | learning rate: 1.161E-05 | global batch size:    16 | lm loss: 6.753475E+00 | loss scale: 32768.0 | grad norm: 270435.281 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2618/  159576 | consumed samples:        41888 | elapsed time per iteration (ms): 14037.5 | learning rate: 1.161E-05 | global batch size:    16 | lm loss: 6.964073E+00 | loss scale: 32768.0 | grad norm: 185416.747 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2619/  159576 | consumed samples:        41904 | elapsed time per iteration (ms): 13552.1 | learning rate: 1.161E-05 | global batch size:    16 | lm loss: 6.609634E+00 | loss scale: 32768.0 | grad norm: 157098.176 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2620/  159576 | consumed samples:        41920 | elapsed time per iteration (ms): 13574.2 | learning rate: 1.162E-05 | global batch size:    16 | lm loss: 7.006974E+00 | loss scale: 32768.0 | grad norm: 140378.271 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2621/  159576 | consumed samples:        41936 | elapsed time per iteration (ms): 13648.0 | learning rate: 1.162E-05 | global batch size:    16 | lm loss: 6.562167E+00 | loss scale: 32768.0 | grad norm: 169654.536 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2622/  159576 | consumed samples:        41952 | elapsed time per iteration (ms): 13713.4 | learning rate: 1.163E-05 | global batch size:    16 | lm loss: 6.810758E+00 | loss scale: 32768.0 | grad norm: 209798.087 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2623/  159576 | consumed samples:        41968 | elapsed time per iteration (ms): 13925.7 | learning rate: 1.163E-05 | global batch size:    16 | lm loss: 6.522465E+00 | loss scale: 32768.0 | grad norm: 119471.106 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2624/  159576 | consumed samples:        41984 | elapsed time per iteration (ms): 13583.0 | learning rate: 1.164E-05 | global batch size:    16 | lm loss: 6.827784E+00 | loss scale: 32768.0 | grad norm: 115498.472 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2625/  159576 | consumed samples:        42000 | elapsed time per iteration (ms): 13618.7 | learning rate: 1.164E-05 | global batch size:    16 | lm loss: 6.663583E+00 | loss scale: 32768.0 | grad norm: 131333.385 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2626/  159576 | consumed samples:        42016 | elapsed time per iteration (ms): 13695.0 | learning rate: 1.164E-05 | global batch size:    16 | lm loss: 6.731676E+00 | loss scale: 32768.0 | grad norm: 105476.602 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2627/  159576 | consumed samples:        42032 | elapsed time per iteration (ms): 14032.3 | learning rate: 1.165E-05 | global batch size:    16 | lm loss: 6.635394E+00 | loss scale: 32768.0 | grad norm: 155841.088 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2628/  159576 | consumed samples:        42048 | elapsed time per iteration (ms): 13596.4 | learning rate: 1.165E-05 | global batch size:    16 | lm loss: 6.768427E+00 | loss scale: 32768.0 | grad norm: 91352.945 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2629/  159576 | consumed samples:        42064 | elapsed time per iteration (ms): 13735.4 | learning rate: 1.166E-05 | global batch size:    16 | lm loss: 6.877464E+00 | loss scale: 32768.0 | grad norm: 246645.890 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2630/  159576 | consumed samples:        42080 | elapsed time per iteration (ms): 13558.6 | learning rate: 1.166E-05 | global batch size:    16 | lm loss: 6.714092E+00 | loss scale: 32768.0 | grad norm: 131077.473 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2631/  159576 | consumed samples:        42096 | elapsed time per iteration (ms): 14063.2 | learning rate: 1.167E-05 | global batch size:    16 | lm loss: 6.598214E+00 | loss scale: 32768.0 | grad norm: 142113.685 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2632/  159576 | consumed samples:        42112 | elapsed time per iteration (ms): 13570.0 | learning rate: 1.167E-05 | global batch size:    16 | lm loss: 6.958339E+00 | loss scale: 32768.0 | grad norm: 196255.218 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2633/  159576 | consumed samples:        42128 | elapsed time per iteration (ms): 13592.6 | learning rate: 1.168E-05 | global batch size:    16 | lm loss: 6.596231E+00 | loss scale: 32768.0 | grad norm: 167680.420 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2634/  159576 | consumed samples:        42144 | elapsed time per iteration (ms): 13671.7 | learning rate: 1.168E-05 | global batch size:    16 | lm loss: 6.775526E+00 | loss scale: 32768.0 | grad norm: 111055.921 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2635/  159576 | consumed samples:        42160 | elapsed time per iteration (ms): 13642.2 | learning rate: 1.168E-05 | global batch size:    16 | lm loss: 6.786438E+00 | loss scale: 32768.0 | grad norm: 146172.944 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2636/  159576 | consumed samples:        42176 | elapsed time per iteration (ms): 14001.7 | learning rate: 1.169E-05 | global batch size:    16 | lm loss: 6.785826E+00 | loss scale: 32768.0 | grad norm: 101705.287 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2637/  159576 | consumed samples:        42192 | elapsed time per iteration (ms): 13632.3 | learning rate: 1.169E-05 | global batch size:    16 | lm loss: 6.918137E+00 | loss scale: 32768.0 | grad norm: 359289.431 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2638/  159576 | consumed samples:        42208 | elapsed time per iteration (ms): 13642.4 | learning rate: 1.170E-05 | global batch size:    16 | lm loss: 6.474925E+00 | loss scale: 32768.0 | grad norm: 210644.789 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2639/  159576 | consumed samples:        42224 | elapsed time per iteration (ms): 13584.1 | learning rate: 1.170E-05 | global batch size:    16 | lm loss: 6.622705E+00 | loss scale: 32768.0 | grad norm: 159853.582 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2640/  159576 | consumed samples:        42240 | elapsed time per iteration (ms): 13928.4 | learning rate: 1.171E-05 | global batch size:    16 | lm loss: 6.883276E+00 | loss scale: 32768.0 | grad norm: 134874.626 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2641/  159576 | consumed samples:        42256 | elapsed time per iteration (ms): 13672.3 | learning rate: 1.171E-05 | global batch size:    16 | lm loss: 6.975843E+00 | loss scale: 32768.0 | grad norm: 136138.664 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2642/  159576 | consumed samples:        42272 | elapsed time per iteration (ms): 13705.7 | learning rate: 1.172E-05 | global batch size:    16 | lm loss: 6.698567E+00 | loss scale: 32768.0 | grad norm: 132708.794 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2643/  159576 | consumed samples:        42288 | elapsed time per iteration (ms): 13640.4 | learning rate: 1.172E-05 | global batch size:    16 | lm loss: 6.910300E+00 | loss scale: 32768.0 | grad norm: 128937.691 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2644/  159576 | consumed samples:        42304 | elapsed time per iteration (ms): 13924.6 | learning rate: 1.172E-05 | global batch size:    16 | lm loss: 6.661136E+00 | loss scale: 32768.0 | grad norm: 144385.230 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2645/  159576 | consumed samples:        42320 | elapsed time per iteration (ms): 13731.5 | learning rate: 1.173E-05 | global batch size:    16 | lm loss: 6.749330E+00 | loss scale: 32768.0 | grad norm: 136497.410 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2646/  159576 | consumed samples:        42336 | elapsed time per iteration (ms): 13631.6 | learning rate: 1.173E-05 | global batch size:    16 | lm loss: 6.774727E+00 | loss scale: 32768.0 | grad norm: 157115.457 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2647/  159576 | consumed samples:        42352 | elapsed time per iteration (ms): 13587.3 | learning rate: 1.174E-05 | global batch size:    16 | lm loss: 6.897247E+00 | loss scale: 32768.0 | grad norm: 122884.703 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2648/  159576 | consumed samples:        42368 | elapsed time per iteration (ms): 13582.9 | learning rate: 1.174E-05 | global batch size:    16 | lm loss: 6.902627E+00 | loss scale: 32768.0 | grad norm: 136617.675 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2649/  159576 | consumed samples:        42384 | elapsed time per iteration (ms): 14194.1 | learning rate: 1.175E-05 | global batch size:    16 | lm loss: 6.654990E+00 | loss scale: 32768.0 | grad norm: 121668.456 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2650/  159576 | consumed samples:        42400 | elapsed time per iteration (ms): 13827.0 | learning rate: 1.175E-05 | global batch size:    16 | lm loss: 6.718140E+00 | loss scale: 32768.0 | grad norm: 94592.966 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2651/  159576 | consumed samples:        42416 | elapsed time per iteration (ms): 13600.7 | learning rate: 1.176E-05 | global batch size:    16 | lm loss: 6.674122E+00 | loss scale: 32768.0 | grad norm: 105220.566 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2652/  159576 | consumed samples:        42432 | elapsed time per iteration (ms): 13643.1 | learning rate: 1.176E-05 | global batch size:    16 | lm loss: 6.662145E+00 | loss scale: 32768.0 | grad norm: 222158.908 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2653/  159576 | consumed samples:        42448 | elapsed time per iteration (ms): 13957.5 | learning rate: 1.176E-05 | global batch size:    16 | lm loss: 6.613699E+00 | loss scale: 32768.0 | grad norm: 110830.033 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2654/  159576 | consumed samples:        42464 | elapsed time per iteration (ms): 13668.1 | learning rate: 1.177E-05 | global batch size:    16 | lm loss: 6.510882E+00 | loss scale: 32768.0 | grad norm: 143615.139 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2655/  159576 | consumed samples:        42480 | elapsed time per iteration (ms): 13633.2 | learning rate: 1.177E-05 | global batch size:    16 | lm loss: 6.732093E+00 | loss scale: 32768.0 | grad norm: 159462.660 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2656/  159576 | consumed samples:        42496 | elapsed time per iteration (ms): 13620.1 | learning rate: 1.178E-05 | global batch size:    16 | lm loss: 6.660037E+00 | loss scale: 32768.0 | grad norm: 244166.739 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2657/  159576 | consumed samples:        42512 | elapsed time per iteration (ms): 13831.3 | learning rate: 1.178E-05 | global batch size:    16 | lm loss: 6.626472E+00 | loss scale: 32768.0 | grad norm: 149275.048 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2658/  159576 | consumed samples:        42528 | elapsed time per iteration (ms): 13824.8 | learning rate: 1.179E-05 | global batch size:    16 | lm loss: 6.687421E+00 | loss scale: 32768.0 | grad norm: 139977.063 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2659/  159576 | consumed samples:        42544 | elapsed time per iteration (ms): 13722.5 | learning rate: 1.179E-05 | global batch size:    16 | lm loss: 6.524724E+00 | loss scale: 32768.0 | grad norm: 106042.464 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2660/  159576 | consumed samples:        42560 | elapsed time per iteration (ms): 13670.7 | learning rate: 1.180E-05 | global batch size:    16 | lm loss: 6.908322E+00 | loss scale: 32768.0 | grad norm: 201686.670 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2661/  159576 | consumed samples:        42576 | elapsed time per iteration (ms): 13612.7 | learning rate: 1.180E-05 | global batch size:    16 | lm loss: 6.837928E+00 | loss scale: 32768.0 | grad norm: 126017.738 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2662/  159576 | consumed samples:        42592 | elapsed time per iteration (ms): 13941.2 | learning rate: 1.180E-05 | global batch size:    16 | lm loss: 6.439098E+00 | loss scale: 32768.0 | grad norm: 160984.308 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2663/  159576 | consumed samples:        42608 | elapsed time per iteration (ms): 13713.4 | learning rate: 1.181E-05 | global batch size:    16 | lm loss: 6.723923E+00 | loss scale: 32768.0 | grad norm: 139598.213 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2664/  159576 | consumed samples:        42624 | elapsed time per iteration (ms): 6797.7 | learning rate: 1.181E-05 | global batch size:    16 | lm loss: 7.335284E+00 | loss scale: 32768.0 | grad norm: 139598.213 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2665/  159576 | consumed samples:        42640 | elapsed time per iteration (ms): 13135.0 | learning rate: 1.181E-05 | global batch size:    16 | lm loss: 6.985713E+00 | loss scale: 32768.0 | grad norm: 180390.498 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2666/  159576 | consumed samples:        42656 | elapsed time per iteration (ms): 13618.0 | learning rate: 1.182E-05 | global batch size:    16 | lm loss: 6.556298E+00 | loss scale: 32768.0 | grad norm: 144470.571 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2667/  159576 | consumed samples:        42672 | elapsed time per iteration (ms): 14126.5 | learning rate: 1.182E-05 | global batch size:    16 | lm loss: 7.063251E+00 | loss scale: 32768.0 | grad norm: 146115.736 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2668/  159576 | consumed samples:        42688 | elapsed time per iteration (ms): 13677.8 | learning rate: 1.183E-05 | global batch size:    16 | lm loss: 6.846446E+00 | loss scale: 32768.0 | grad norm: 164938.381 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2669/  159576 | consumed samples:        42704 | elapsed time per iteration (ms): 13662.5 | learning rate: 1.183E-05 | global batch size:    16 | lm loss: 6.704443E+00 | loss scale: 32768.0 | grad norm: 183338.838 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2670/  159576 | consumed samples:        42720 | elapsed time per iteration (ms): 13752.8 | learning rate: 1.184E-05 | global batch size:    16 | lm loss: 6.828314E+00 | loss scale: 32768.0 | grad norm: 291659.916 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2671/  159576 | consumed samples:        42736 | elapsed time per iteration (ms): 14053.5 | learning rate: 1.184E-05 | global batch size:    16 | lm loss: 6.701608E+00 | loss scale: 32768.0 | grad norm: 137566.756 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2672/  159576 | consumed samples:        42752 | elapsed time per iteration (ms): 13555.7 | learning rate: 1.184E-05 | global batch size:    16 | lm loss: 6.495778E+00 | loss scale: 32768.0 | grad norm: 140566.748 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2673/  159576 | consumed samples:        42768 | elapsed time per iteration (ms): 13625.0 | learning rate: 1.185E-05 | global batch size:    16 | lm loss: 6.868438E+00 | loss scale: 32768.0 | grad norm: 137822.671 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2674/  159576 | consumed samples:        42784 | elapsed time per iteration (ms): 13681.3 | learning rate: 1.185E-05 | global batch size:    16 | lm loss: 6.855990E+00 | loss scale: 32768.0 | grad norm: 217925.291 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2675/  159576 | consumed samples:        42800 | elapsed time per iteration (ms): 13726.3 | learning rate: 1.186E-05 | global batch size:    16 | lm loss: 6.726338E+00 | loss scale: 32768.0 | grad norm: 169676.723 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2676/  159576 | consumed samples:        42816 | elapsed time per iteration (ms): 14028.2 | learning rate: 1.186E-05 | global batch size:    16 | lm loss: 6.632861E+00 | loss scale: 32768.0 | grad norm: 146027.824 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2677/  159576 | consumed samples:        42832 | elapsed time per iteration (ms): 13624.3 | learning rate: 1.187E-05 | global batch size:    16 | lm loss: 6.642831E+00 | loss scale: 32768.0 | grad norm: 163148.856 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2678/  159576 | consumed samples:        42848 | elapsed time per iteration (ms): 13717.5 | learning rate: 1.187E-05 | global batch size:    16 | lm loss: 6.689285E+00 | loss scale: 32768.0 | grad norm: 129142.991 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2679/  159576 | consumed samples:        42864 | elapsed time per iteration (ms): 13575.7 | learning rate: 1.188E-05 | global batch size:    16 | lm loss: 6.577474E+00 | loss scale: 32768.0 | grad norm: 168075.285 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2680/  159576 | consumed samples:        42880 | elapsed time per iteration (ms): 13990.7 | learning rate: 1.188E-05 | global batch size:    16 | lm loss: 6.806996E+00 | loss scale: 32768.0 | grad norm: 138707.563 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2681/  159576 | consumed samples:        42896 | elapsed time per iteration (ms): 13614.3 | learning rate: 1.188E-05 | global batch size:    16 | lm loss: 6.616170E+00 | loss scale: 32768.0 | grad norm: 138396.885 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2682/  159576 | consumed samples:        42912 | elapsed time per iteration (ms): 13528.4 | learning rate: 1.189E-05 | global batch size:    16 | lm loss: 6.760321E+00 | loss scale: 32768.0 | grad norm: 146622.283 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2683/  159576 | consumed samples:        42928 | elapsed time per iteration (ms): 13595.4 | learning rate: 1.189E-05 | global batch size:    16 | lm loss: 6.828167E+00 | loss scale: 32768.0 | grad norm: 205452.941 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2684/  159576 | consumed samples:        42944 | elapsed time per iteration (ms): 14090.0 | learning rate: 1.190E-05 | global batch size:    16 | lm loss: 6.974781E+00 | loss scale: 32768.0 | grad norm: 141438.762 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2685/  159576 | consumed samples:        42960 | elapsed time per iteration (ms): 13490.5 | learning rate: 1.190E-05 | global batch size:    16 | lm loss: 6.720265E+00 | loss scale: 32768.0 | grad norm: 131667.640 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2686/  159576 | consumed samples:        42976 | elapsed time per iteration (ms): 13606.4 | learning rate: 1.191E-05 | global batch size:    16 | lm loss: 6.645846E+00 | loss scale: 32768.0 | grad norm: 143915.440 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2687/  159576 | consumed samples:        42992 | elapsed time per iteration (ms): 13579.9 | learning rate: 1.191E-05 | global batch size:    16 | lm loss: 6.852206E+00 | loss scale: 32768.0 | grad norm: 206032.603 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2688/  159576 | consumed samples:        43008 | elapsed time per iteration (ms): 13654.7 | learning rate: 1.192E-05 | global batch size:    16 | lm loss: 6.708066E+00 | loss scale: 32768.0 | grad norm: 135547.494 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2689/  159576 | consumed samples:        43024 | elapsed time per iteration (ms): 13756.9 | learning rate: 1.192E-05 | global batch size:    16 | lm loss: 6.627333E+00 | loss scale: 32768.0 | grad norm: 103806.748 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2690/  159576 | consumed samples:        43040 | elapsed time per iteration (ms): 13560.8 | learning rate: 1.192E-05 | global batch size:    16 | lm loss: 6.624159E+00 | loss scale: 32768.0 | grad norm: 204724.023 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2691/  159576 | consumed samples:        43056 | elapsed time per iteration (ms): 13656.6 | learning rate: 1.193E-05 | global batch size:    16 | lm loss: 6.803893E+00 | loss scale: 32768.0 | grad norm: 123248.563 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2692/  159576 | consumed samples:        43072 | elapsed time per iteration (ms): 13672.9 | learning rate: 1.193E-05 | global batch size:    16 | lm loss: 6.801785E+00 | loss scale: 32768.0 | grad norm: 140785.815 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2693/  159576 | consumed samples:        43088 | elapsed time per iteration (ms): 14015.4 | learning rate: 1.194E-05 | global batch size:    16 | lm loss: 6.464381E+00 | loss scale: 32768.0 | grad norm: 131615.707 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2694/  159576 | consumed samples:        43104 | elapsed time per iteration (ms): 13588.1 | learning rate: 1.194E-05 | global batch size:    16 | lm loss: 6.727094E+00 | loss scale: 32768.0 | grad norm: 213544.967 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2695/  159576 | consumed samples:        43120 | elapsed time per iteration (ms): 13608.1 | learning rate: 1.195E-05 | global batch size:    16 | lm loss: 6.930735E+00 | loss scale: 32768.0 | grad norm: 179180.455 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2696/  159576 | consumed samples:        43136 | elapsed time per iteration (ms): 13594.8 | learning rate: 1.195E-05 | global batch size:    16 | lm loss: 6.652137E+00 | loss scale: 32768.0 | grad norm: 171091.491 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2697/  159576 | consumed samples:        43152 | elapsed time per iteration (ms): 13943.3 | learning rate: 1.196E-05 | global batch size:    16 | lm loss: 6.731685E+00 | loss scale: 32768.0 | grad norm: 151811.525 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2698/  159576 | consumed samples:        43168 | elapsed time per iteration (ms): 13773.1 | learning rate: 1.196E-05 | global batch size:    16 | lm loss: 7.081783E+00 | loss scale: 32768.0 | grad norm: 132367.994 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2699/  159576 | consumed samples:        43184 | elapsed time per iteration (ms): 13644.6 | learning rate: 1.196E-05 | global batch size:    16 | lm loss: 6.806893E+00 | loss scale: 32768.0 | grad norm: 319459.435 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2700/  159576 | consumed samples:        43200 | elapsed time per iteration (ms): 13698.5 | learning rate: 1.197E-05 | global batch size:    16 | lm loss: 6.666497E+00 | loss scale: 32768.0 | grad norm: 120927.371 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2701/  159576 | consumed samples:        43216 | elapsed time per iteration (ms): 13684.8 | learning rate: 1.197E-05 | global batch size:    16 | lm loss: 6.701412E+00 | loss scale: 32768.0 | grad norm: 150633.210 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2702/  159576 | consumed samples:        43232 | elapsed time per iteration (ms): 13780.3 | learning rate: 1.198E-05 | global batch size:    16 | lm loss: 6.594296E+00 | loss scale: 32768.0 | grad norm: 161110.656 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2703/  159576 | consumed samples:        43248 | elapsed time per iteration (ms): 13593.9 | learning rate: 1.198E-05 | global batch size:    16 | lm loss: 6.808178E+00 | loss scale: 32768.0 | grad norm: 258358.199 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2704/  159576 | consumed samples:        43264 | elapsed time per iteration (ms): 13635.4 | learning rate: 1.199E-05 | global batch size:    16 | lm loss: 6.815506E+00 | loss scale: 32768.0 | grad norm: 183028.281 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2705/  159576 | consumed samples:        43280 | elapsed time per iteration (ms): 13605.1 | learning rate: 1.199E-05 | global batch size:    16 | lm loss: 6.967249E+00 | loss scale: 32768.0 | grad norm: 243583.534 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2706/  159576 | consumed samples:        43296 | elapsed time per iteration (ms): 14130.1 | learning rate: 1.200E-05 | global batch size:    16 | lm loss: 7.062543E+00 | loss scale: 32768.0 | grad norm: 207737.438 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2707/  159576 | consumed samples:        43312 | elapsed time per iteration (ms): 13561.8 | learning rate: 1.200E-05 | global batch size:    16 | lm loss: 6.758321E+00 | loss scale: 32768.0 | grad norm: 146527.588 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2708/  159576 | consumed samples:        43328 | elapsed time per iteration (ms): 13722.0 | learning rate: 1.200E-05 | global batch size:    16 | lm loss: 6.584868E+00 | loss scale: 32768.0 | grad norm: 272015.780 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2709/  159576 | consumed samples:        43344 | elapsed time per iteration (ms): 13654.1 | learning rate: 1.201E-05 | global batch size:    16 | lm loss: 6.709559E+00 | loss scale: 32768.0 | grad norm: 284012.046 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2710/  159576 | consumed samples:        43360 | elapsed time per iteration (ms): 13595.7 | learning rate: 1.201E-05 | global batch size:    16 | lm loss: 6.830414E+00 | loss scale: 32768.0 | grad norm: 149403.503 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2711/  159576 | consumed samples:        43376 | elapsed time per iteration (ms): 13973.4 | learning rate: 1.202E-05 | global batch size:    16 | lm loss: 6.624958E+00 | loss scale: 32768.0 | grad norm: 146777.014 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2712/  159576 | consumed samples:        43392 | elapsed time per iteration (ms): 13700.0 | learning rate: 1.202E-05 | global batch size:    16 | lm loss: 6.735670E+00 | loss scale: 32768.0 | grad norm: 136631.989 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2713/  159576 | consumed samples:        43408 | elapsed time per iteration (ms): 13572.3 | learning rate: 1.203E-05 | global batch size:    16 | lm loss: 6.765169E+00 | loss scale: 32768.0 | grad norm: 280479.328 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2714/  159576 | consumed samples:        43424 | elapsed time per iteration (ms): 13642.4 | learning rate: 1.203E-05 | global batch size:    16 | lm loss: 6.622662E+00 | loss scale: 32768.0 | grad norm: 160875.579 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2715/  159576 | consumed samples:        43440 | elapsed time per iteration (ms): 14122.3 | learning rate: 1.204E-05 | global batch size:    16 | lm loss: 6.730956E+00 | loss scale: 32768.0 | grad norm: 206409.146 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2716/  159576 | consumed samples:        43456 | elapsed time per iteration (ms): 13831.1 | learning rate: 1.204E-05 | global batch size:    16 | lm loss: 6.767645E+00 | loss scale: 32768.0 | grad norm: 149352.449 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2717/  159576 | consumed samples:        43472 | elapsed time per iteration (ms): 13572.9 | learning rate: 1.204E-05 | global batch size:    16 | lm loss: 6.975914E+00 | loss scale: 32768.0 | grad norm: 119850.584 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2718/  159576 | consumed samples:        43488 | elapsed time per iteration (ms): 13686.9 | learning rate: 1.205E-05 | global batch size:    16 | lm loss: 6.919794E+00 | loss scale: 32768.0 | grad norm: 172348.990 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2719/  159576 | consumed samples:        43504 | elapsed time per iteration (ms): 13976.8 | learning rate: 1.205E-05 | global batch size:    16 | lm loss: 6.652202E+00 | loss scale: 32768.0 | grad norm: 178184.791 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2720/  159576 | consumed samples:        43520 | elapsed time per iteration (ms): 13571.8 | learning rate: 1.206E-05 | global batch size:    16 | lm loss: 6.787558E+00 | loss scale: 32768.0 | grad norm: 130225.615 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2721/  159576 | consumed samples:        43536 | elapsed time per iteration (ms): 13693.7 | learning rate: 1.206E-05 | global batch size:    16 | lm loss: 6.660249E+00 | loss scale: 32768.0 | grad norm: 144428.996 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2722/  159576 | consumed samples:        43552 | elapsed time per iteration (ms): 13646.9 | learning rate: 1.207E-05 | global batch size:    16 | lm loss: 6.661267E+00 | loss scale: 32768.0 | grad norm: 121995.599 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2723/  159576 | consumed samples:        43568 | elapsed time per iteration (ms): 13718.1 | learning rate: 1.207E-05 | global batch size:    16 | lm loss: 6.702977E+00 | loss scale: 32768.0 | grad norm: 205375.821 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2724/  159576 | consumed samples:        43584 | elapsed time per iteration (ms): 14072.2 | learning rate: 1.208E-05 | global batch size:    16 | lm loss: 6.859900E+00 | loss scale: 32768.0 | grad norm: 174185.553 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2725/  159576 | consumed samples:        43600 | elapsed time per iteration (ms): 13643.1 | learning rate: 1.208E-05 | global batch size:    16 | lm loss: 6.642687E+00 | loss scale: 32768.0 | grad norm: 124356.151 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2726/  159576 | consumed samples:        43616 | elapsed time per iteration (ms): 13637.6 | learning rate: 1.208E-05 | global batch size:    16 | lm loss: 6.849540E+00 | loss scale: 32768.0 | grad norm: 187912.708 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2727/  159576 | consumed samples:        43632 | elapsed time per iteration (ms): 13570.5 | learning rate: 1.209E-05 | global batch size:    16 | lm loss: 6.505477E+00 | loss scale: 32768.0 | grad norm: 146429.461 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2728/  159576 | consumed samples:        43648 | elapsed time per iteration (ms): 14179.1 | learning rate: 1.209E-05 | global batch size:    16 | lm loss: 6.763928E+00 | loss scale: 32768.0 | grad norm: 143016.379 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2729/  159576 | consumed samples:        43664 | elapsed time per iteration (ms): 13666.5 | learning rate: 1.210E-05 | global batch size:    16 | lm loss: 6.746594E+00 | loss scale: 32768.0 | grad norm: 184649.070 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2730/  159576 | consumed samples:        43680 | elapsed time per iteration (ms): 13666.9 | learning rate: 1.210E-05 | global batch size:    16 | lm loss: 6.822509E+00 | loss scale: 32768.0 | grad norm: 258599.749 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2731/  159576 | consumed samples:        43696 | elapsed time per iteration (ms): 13722.5 | learning rate: 1.211E-05 | global batch size:    16 | lm loss: 6.726813E+00 | loss scale: 32768.0 | grad norm: 135253.086 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2732/  159576 | consumed samples:        43712 | elapsed time per iteration (ms): 14110.6 | learning rate: 1.211E-05 | global batch size:    16 | lm loss: 6.642574E+00 | loss scale: 32768.0 | grad norm: 187051.418 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2733/  159576 | consumed samples:        43728 | elapsed time per iteration (ms): 13665.7 | learning rate: 1.212E-05 | global batch size:    16 | lm loss: 6.608624E+00 | loss scale: 32768.0 | grad norm: 164163.009 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2734/  159576 | consumed samples:        43744 | elapsed time per iteration (ms): 13624.6 | learning rate: 1.212E-05 | global batch size:    16 | lm loss: 6.755674E+00 | loss scale: 32768.0 | grad norm: 129230.586 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2735/  159576 | consumed samples:        43760 | elapsed time per iteration (ms): 13617.1 | learning rate: 1.212E-05 | global batch size:    16 | lm loss: 6.771841E+00 | loss scale: 32768.0 | grad norm: 254766.602 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2736/  159576 | consumed samples:        43776 | elapsed time per iteration (ms): 13675.3 | learning rate: 1.213E-05 | global batch size:    16 | lm loss: 6.677852E+00 | loss scale: 32768.0 | grad norm: 142644.144 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2737/  159576 | consumed samples:        43792 | elapsed time per iteration (ms): 13983.3 | learning rate: 1.213E-05 | global batch size:    16 | lm loss: 6.719501E+00 | loss scale: 32768.0 | grad norm: 164953.828 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2738/  159576 | consumed samples:        43808 | elapsed time per iteration (ms): 13774.1 | learning rate: 1.214E-05 | global batch size:    16 | lm loss: 6.637510E+00 | loss scale: 32768.0 | grad norm: 161949.402 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2739/  159576 | consumed samples:        43824 | elapsed time per iteration (ms): 13780.8 | learning rate: 1.214E-05 | global batch size:    16 | lm loss: 6.670253E+00 | loss scale: 32768.0 | grad norm: 132053.899 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2740/  159576 | consumed samples:        43840 | elapsed time per iteration (ms): 13656.5 | learning rate: 1.215E-05 | global batch size:    16 | lm loss: 6.701370E+00 | loss scale: 32768.0 | grad norm: 158609.635 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2741/  159576 | consumed samples:        43856 | elapsed time per iteration (ms): 13970.4 | learning rate: 1.215E-05 | global batch size:    16 | lm loss: 6.676120E+00 | loss scale: 32768.0 | grad norm: 133079.118 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2742/  159576 | consumed samples:        43872 | elapsed time per iteration (ms): 13572.9 | learning rate: 1.216E-05 | global batch size:    16 | lm loss: 6.666083E+00 | loss scale: 32768.0 | grad norm: 121076.330 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2743/  159576 | consumed samples:        43888 | elapsed time per iteration (ms): 13635.9 | learning rate: 1.216E-05 | global batch size:    16 | lm loss: 6.594894E+00 | loss scale: 32768.0 | grad norm: 206897.979 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2744/  159576 | consumed samples:        43904 | elapsed time per iteration (ms): 13681.8 | learning rate: 1.216E-05 | global batch size:    16 | lm loss: 6.700480E+00 | loss scale: 32768.0 | grad norm: 126037.905 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2745/  159576 | consumed samples:        43920 | elapsed time per iteration (ms): 13966.9 | learning rate: 1.217E-05 | global batch size:    16 | lm loss: 6.708483E+00 | loss scale: 32768.0 | grad norm: 136172.741 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2746/  159576 | consumed samples:        43936 | elapsed time per iteration (ms): 13758.4 | learning rate: 1.217E-05 | global batch size:    16 | lm loss: 6.629419E+00 | loss scale: 32768.0 | grad norm: 142570.267 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2747/  159576 | consumed samples:        43952 | elapsed time per iteration (ms): 13668.5 | learning rate: 1.218E-05 | global batch size:    16 | lm loss: 6.597517E+00 | loss scale: 32768.0 | grad norm: 155237.223 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2748/  159576 | consumed samples:        43968 | elapsed time per iteration (ms): 13633.2 | learning rate: 1.218E-05 | global batch size:    16 | lm loss: 6.561327E+00 | loss scale: 32768.0 | grad norm: 162642.892 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2749/  159576 | consumed samples:        43984 | elapsed time per iteration (ms): 13608.4 | learning rate: 1.219E-05 | global batch size:    16 | lm loss: 6.677460E+00 | loss scale: 32768.0 | grad norm: 192650.212 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2750/  159576 | consumed samples:        44000 | elapsed time per iteration (ms): 13886.7 | learning rate: 1.219E-05 | global batch size:    16 | lm loss: 6.649335E+00 | loss scale: 32768.0 | grad norm: 171673.975 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2751/  159576 | consumed samples:        44016 | elapsed time per iteration (ms): 13671.6 | learning rate: 1.220E-05 | global batch size:    16 | lm loss: 6.735415E+00 | loss scale: 32768.0 | grad norm: 128822.354 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2752/  159576 | consumed samples:        44032 | elapsed time per iteration (ms): 13708.1 | learning rate: 1.220E-05 | global batch size:    16 | lm loss: 6.679979E+00 | loss scale: 32768.0 | grad norm: 253310.737 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2753/  159576 | consumed samples:        44048 | elapsed time per iteration (ms): 13770.7 | learning rate: 1.220E-05 | global batch size:    16 | lm loss: 6.565764E+00 | loss scale: 32768.0 | grad norm: 116179.545 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2754/  159576 | consumed samples:        44064 | elapsed time per iteration (ms): 14066.6 | learning rate: 1.221E-05 | global batch size:    16 | lm loss: 6.742185E+00 | loss scale: 32768.0 | grad norm: 141403.598 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2755/  159576 | consumed samples:        44080 | elapsed time per iteration (ms): 13651.8 | learning rate: 1.221E-05 | global batch size:    16 | lm loss: 6.762599E+00 | loss scale: 32768.0 | grad norm: 111172.995 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2756/  159576 | consumed samples:        44096 | elapsed time per iteration (ms): 13694.5 | learning rate: 1.222E-05 | global batch size:    16 | lm loss: 6.733878E+00 | loss scale: 32768.0 | grad norm: 128168.972 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2757/  159576 | consumed samples:        44112 | elapsed time per iteration (ms): 13604.8 | learning rate: 1.222E-05 | global batch size:    16 | lm loss: 6.588708E+00 | loss scale: 32768.0 | grad norm: 103022.500 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2758/  159576 | consumed samples:        44128 | elapsed time per iteration (ms): 13653.9 | learning rate: 1.223E-05 | global batch size:    16 | lm loss: 6.562719E+00 | loss scale: 32768.0 | grad norm: 138192.892 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2759/  159576 | consumed samples:        44144 | elapsed time per iteration (ms): 13986.1 | learning rate: 1.223E-05 | global batch size:    16 | lm loss: 6.738625E+00 | loss scale: 32768.0 | grad norm: 121839.165 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2760/  159576 | consumed samples:        44160 | elapsed time per iteration (ms): 13725.3 | learning rate: 1.224E-05 | global batch size:    16 | lm loss: 6.566117E+00 | loss scale: 32768.0 | grad norm: 104901.052 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2761/  159576 | consumed samples:        44176 | elapsed time per iteration (ms): 13770.1 | learning rate: 1.224E-05 | global batch size:    16 | lm loss: 6.666871E+00 | loss scale: 32768.0 | grad norm: 123398.519 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2762/  159576 | consumed samples:        44192 | elapsed time per iteration (ms): 13627.5 | learning rate: 1.224E-05 | global batch size:    16 | lm loss: 6.835371E+00 | loss scale: 32768.0 | grad norm: 112214.547 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2763/  159576 | consumed samples:        44208 | elapsed time per iteration (ms): 14068.3 | learning rate: 1.225E-05 | global batch size:    16 | lm loss: 6.804303E+00 | loss scale: 32768.0 | grad norm: 122506.789 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2764/  159576 | consumed samples:        44224 | elapsed time per iteration (ms): 6917.6 | learning rate: 1.225E-05 | global batch size:    16 | lm loss: 6.972560E+00 | loss scale: 16384.0 | grad norm: 122506.789 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2765/  159576 | consumed samples:        44240 | elapsed time per iteration (ms): 13181.9 | learning rate: 1.225E-05 | global batch size:    16 | lm loss: 6.580292E+00 | loss scale: 16384.0 | grad norm: 59992.079 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2766/  159576 | consumed samples:        44256 | elapsed time per iteration (ms): 13680.1 | learning rate: 1.226E-05 | global batch size:    16 | lm loss: 6.724333E+00 | loss scale: 16384.0 | grad norm: 77015.113 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2767/  159576 | consumed samples:        44272 | elapsed time per iteration (ms): 13716.6 | learning rate: 1.226E-05 | global batch size:    16 | lm loss: 6.933354E+00 | loss scale: 16384.0 | grad norm: 85522.390 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2768/  159576 | consumed samples:        44288 | elapsed time per iteration (ms): 13994.0 | learning rate: 1.227E-05 | global batch size:    16 | lm loss: 6.648163E+00 | loss scale: 16384.0 | grad norm: 58295.975 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2769/  159576 | consumed samples:        44304 | elapsed time per iteration (ms): 13658.9 | learning rate: 1.227E-05 | global batch size:    16 | lm loss: 6.891530E+00 | loss scale: 16384.0 | grad norm: 75446.588 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2770/  159576 | consumed samples:        44320 | elapsed time per iteration (ms): 13703.7 | learning rate: 1.228E-05 | global batch size:    16 | lm loss: 6.591332E+00 | loss scale: 16384.0 | grad norm: 59290.056 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2771/  159576 | consumed samples:        44336 | elapsed time per iteration (ms): 13716.9 | learning rate: 1.228E-05 | global batch size:    16 | lm loss: 6.737020E+00 | loss scale: 16384.0 | grad norm: 51929.323 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2772/  159576 | consumed samples:        44352 | elapsed time per iteration (ms): 14010.7 | learning rate: 1.228E-05 | global batch size:    16 | lm loss: 6.565439E+00 | loss scale: 16384.0 | grad norm: 100304.309 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2773/  159576 | consumed samples:        44368 | elapsed time per iteration (ms): 13566.2 | learning rate: 1.229E-05 | global batch size:    16 | lm loss: 6.887408E+00 | loss scale: 16384.0 | grad norm: 86699.024 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2774/  159576 | consumed samples:        44384 | elapsed time per iteration (ms): 13639.1 | learning rate: 1.229E-05 | global batch size:    16 | lm loss: 6.766156E+00 | loss scale: 16384.0 | grad norm: 64840.948 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2775/  159576 | consumed samples:        44400 | elapsed time per iteration (ms): 13646.1 | learning rate: 1.230E-05 | global batch size:    16 | lm loss: 6.640082E+00 | loss scale: 16384.0 | grad norm: 61943.696 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2776/  159576 | consumed samples:        44416 | elapsed time per iteration (ms): 13670.4 | learning rate: 1.230E-05 | global batch size:    16 | lm loss: 6.784959E+00 | loss scale: 16384.0 | grad norm: 68978.844 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2777/  159576 | consumed samples:        44432 | elapsed time per iteration (ms): 14012.8 | learning rate: 1.231E-05 | global batch size:    16 | lm loss: 6.670368E+00 | loss scale: 16384.0 | grad norm: 58668.320 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2778/  159576 | consumed samples:        44448 | elapsed time per iteration (ms): 13651.5 | learning rate: 1.231E-05 | global batch size:    16 | lm loss: 6.849538E+00 | loss scale: 16384.0 | grad norm: 53539.454 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2779/  159576 | consumed samples:        44464 | elapsed time per iteration (ms): 13531.1 | learning rate: 1.232E-05 | global batch size:    16 | lm loss: 6.710807E+00 | loss scale: 16384.0 | grad norm: 58047.417 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2780/  159576 | consumed samples:        44480 | elapsed time per iteration (ms): 13601.2 | learning rate: 1.232E-05 | global batch size:    16 | lm loss: 6.803576E+00 | loss scale: 16384.0 | grad norm: 61014.969 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2781/  159576 | consumed samples:        44496 | elapsed time per iteration (ms): 14011.6 | learning rate: 1.232E-05 | global batch size:    16 | lm loss: 6.435648E+00 | loss scale: 16384.0 | grad norm: 72928.257 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2782/  159576 | consumed samples:        44512 | elapsed time per iteration (ms): 13706.9 | learning rate: 1.233E-05 | global batch size:    16 | lm loss: 6.689322E+00 | loss scale: 16384.0 | grad norm: 45124.285 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2783/  159576 | consumed samples:        44528 | elapsed time per iteration (ms): 13638.0 | learning rate: 1.233E-05 | global batch size:    16 | lm loss: 6.796506E+00 | loss scale: 16384.0 | grad norm: 61254.307 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2784/  159576 | consumed samples:        44544 | elapsed time per iteration (ms): 13617.3 | learning rate: 1.234E-05 | global batch size:    16 | lm loss: 6.726316E+00 | loss scale: 16384.0 | grad norm: 58102.179 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2785/  159576 | consumed samples:        44560 | elapsed time per iteration (ms): 13946.8 | learning rate: 1.234E-05 | global batch size:    16 | lm loss: 6.648038E+00 | loss scale: 16384.0 | grad norm: 68282.211 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2786/  159576 | consumed samples:        44576 | elapsed time per iteration (ms): 13594.9 | learning rate: 1.235E-05 | global batch size:    16 | lm loss: 6.860110E+00 | loss scale: 16384.0 | grad norm: 70475.870 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2787/  159576 | consumed samples:        44592 | elapsed time per iteration (ms): 13607.8 | learning rate: 1.235E-05 | global batch size:    16 | lm loss: 6.821939E+00 | loss scale: 16384.0 | grad norm: 56499.351 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2788/  159576 | consumed samples:        44608 | elapsed time per iteration (ms): 13592.1 | learning rate: 1.236E-05 | global batch size:    16 | lm loss: 6.702363E+00 | loss scale: 16384.0 | grad norm: 71878.494 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2789/  159576 | consumed samples:        44624 | elapsed time per iteration (ms): 13633.0 | learning rate: 1.236E-05 | global batch size:    16 | lm loss: 6.596258E+00 | loss scale: 16384.0 | grad norm: 57167.131 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2790/  159576 | consumed samples:        44640 | elapsed time per iteration (ms): 13806.2 | learning rate: 1.236E-05 | global batch size:    16 | lm loss: 6.742100E+00 | loss scale: 16384.0 | grad norm: 78591.535 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2791/  159576 | consumed samples:        44656 | elapsed time per iteration (ms): 13659.4 | learning rate: 1.237E-05 | global batch size:    16 | lm loss: 6.602869E+00 | loss scale: 16384.0 | grad norm: 68726.337 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2792/  159576 | consumed samples:        44672 | elapsed time per iteration (ms): 13592.2 | learning rate: 1.237E-05 | global batch size:    16 | lm loss: 6.708993E+00 | loss scale: 16384.0 | grad norm: 98214.491 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2793/  159576 | consumed samples:        44688 | elapsed time per iteration (ms): 13507.3 | learning rate: 1.238E-05 | global batch size:    16 | lm loss: 6.616965E+00 | loss scale: 16384.0 | grad norm: 72150.719 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2794/  159576 | consumed samples:        44704 | elapsed time per iteration (ms): 13955.1 | learning rate: 1.238E-05 | global batch size:    16 | lm loss: 6.607640E+00 | loss scale: 16384.0 | grad norm: 62728.696 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2795/  159576 | consumed samples:        44720 | elapsed time per iteration (ms): 13531.1 | learning rate: 1.239E-05 | global batch size:    16 | lm loss: 6.875388E+00 | loss scale: 16384.0 | grad norm: 94768.672 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2796/  159576 | consumed samples:        44736 | elapsed time per iteration (ms): 13614.2 | learning rate: 1.239E-05 | global batch size:    16 | lm loss: 6.827682E+00 | loss scale: 16384.0 | grad norm: 59818.476 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2797/  159576 | consumed samples:        44752 | elapsed time per iteration (ms): 13620.6 | learning rate: 1.239E-05 | global batch size:    16 | lm loss: 6.522869E+00 | loss scale: 16384.0 | grad norm: 74009.172 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2798/  159576 | consumed samples:        44768 | elapsed time per iteration (ms): 13985.4 | learning rate: 1.240E-05 | global batch size:    16 | lm loss: 6.654684E+00 | loss scale: 16384.0 | grad norm: 54913.035 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2799/  159576 | consumed samples:        44784 | elapsed time per iteration (ms): 13759.4 | learning rate: 1.240E-05 | global batch size:    16 | lm loss: 6.544140E+00 | loss scale: 16384.0 | grad norm: 83654.114 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2800/  159576 | consumed samples:        44800 | elapsed time per iteration (ms): 13524.0 | learning rate: 1.241E-05 | global batch size:    16 | lm loss: 6.798269E+00 | loss scale: 16384.0 | grad norm: 80678.341 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2801/  159576 | consumed samples:        44816 | elapsed time per iteration (ms): 13646.5 | learning rate: 1.241E-05 | global batch size:    16 | lm loss: 6.872281E+00 | loss scale: 16384.0 | grad norm: 49084.933 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2802/  159576 | consumed samples:        44832 | elapsed time per iteration (ms): 13614.0 | learning rate: 1.242E-05 | global batch size:    16 | lm loss: 6.733764E+00 | loss scale: 16384.0 | grad norm: 88585.751 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2803/  159576 | consumed samples:        44848 | elapsed time per iteration (ms): 13792.4 | learning rate: 1.242E-05 | global batch size:    16 | lm loss: 6.865559E+00 | loss scale: 16384.0 | grad norm: 48186.949 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2804/  159576 | consumed samples:        44864 | elapsed time per iteration (ms): 13655.0 | learning rate: 1.243E-05 | global batch size:    16 | lm loss: 6.631515E+00 | loss scale: 16384.0 | grad norm: 66281.190 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2805/  159576 | consumed samples:        44880 | elapsed time per iteration (ms): 13605.4 | learning rate: 1.243E-05 | global batch size:    16 | lm loss: 6.593436E+00 | loss scale: 16384.0 | grad norm: 66274.800 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2806/  159576 | consumed samples:        44896 | elapsed time per iteration (ms): 13611.6 | learning rate: 1.243E-05 | global batch size:    16 | lm loss: 6.692297E+00 | loss scale: 16384.0 | grad norm: 66535.812 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2807/  159576 | consumed samples:        44912 | elapsed time per iteration (ms): 13924.4 | learning rate: 1.244E-05 | global batch size:    16 | lm loss: 6.564488E+00 | loss scale: 16384.0 | grad norm: 62289.026 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2808/  159576 | consumed samples:        44928 | elapsed time per iteration (ms): 13559.5 | learning rate: 1.244E-05 | global batch size:    16 | lm loss: 6.775381E+00 | loss scale: 16384.0 | grad norm: 51114.400 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2809/  159576 | consumed samples:        44944 | elapsed time per iteration (ms): 13579.6 | learning rate: 1.245E-05 | global batch size:    16 | lm loss: 6.854599E+00 | loss scale: 16384.0 | grad norm: 78574.479 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2810/  159576 | consumed samples:        44960 | elapsed time per iteration (ms): 13568.8 | learning rate: 1.245E-05 | global batch size:    16 | lm loss: 6.641658E+00 | loss scale: 16384.0 | grad norm: 48054.399 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2811/  159576 | consumed samples:        44976 | elapsed time per iteration (ms): 13577.2 | learning rate: 1.246E-05 | global batch size:    16 | lm loss: 6.804714E+00 | loss scale: 16384.0 | grad norm: 85293.239 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2812/  159576 | consumed samples:        44992 | elapsed time per iteration (ms): 13780.4 | learning rate: 1.246E-05 | global batch size:    16 | lm loss: 6.484572E+00 | loss scale: 16384.0 | grad norm: 54599.094 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2813/  159576 | consumed samples:        45008 | elapsed time per iteration (ms): 13630.2 | learning rate: 1.247E-05 | global batch size:    16 | lm loss: 6.495656E+00 | loss scale: 16384.0 | grad norm: 131722.081 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2814/  159576 | consumed samples:        45024 | elapsed time per iteration (ms): 13626.8 | learning rate: 1.247E-05 | global batch size:    16 | lm loss: 6.894939E+00 | loss scale: 16384.0 | grad norm: 102881.431 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2815/  159576 | consumed samples:        45040 | elapsed time per iteration (ms): 13599.0 | learning rate: 1.247E-05 | global batch size:    16 | lm loss: 6.883965E+00 | loss scale: 16384.0 | grad norm: 72100.325 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2816/  159576 | consumed samples:        45056 | elapsed time per iteration (ms): 14052.1 | learning rate: 1.248E-05 | global batch size:    16 | lm loss: 6.573022E+00 | loss scale: 16384.0 | grad norm: 72968.507 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2817/  159576 | consumed samples:        45072 | elapsed time per iteration (ms): 13541.1 | learning rate: 1.248E-05 | global batch size:    16 | lm loss: 6.646833E+00 | loss scale: 16384.0 | grad norm: 90510.016 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2818/  159576 | consumed samples:        45088 | elapsed time per iteration (ms): 13597.7 | learning rate: 1.249E-05 | global batch size:    16 | lm loss: 6.898618E+00 | loss scale: 16384.0 | grad norm: 90037.566 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2819/  159576 | consumed samples:        45104 | elapsed time per iteration (ms): 13575.0 | learning rate: 1.249E-05 | global batch size:    16 | lm loss: 6.547668E+00 | loss scale: 16384.0 | grad norm: 79277.803 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2820/  159576 | consumed samples:        45120 | elapsed time per iteration (ms): 14016.3 | learning rate: 1.250E-05 | global batch size:    16 | lm loss: 6.791230E+00 | loss scale: 16384.0 | grad norm: 63437.139 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2821/  159576 | consumed samples:        45136 | elapsed time per iteration (ms): 13565.5 | learning rate: 1.250E-05 | global batch size:    16 | lm loss: 6.957808E+00 | loss scale: 16384.0 | grad norm: 56738.743 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2822/  159576 | consumed samples:        45152 | elapsed time per iteration (ms): 13564.0 | learning rate: 1.251E-05 | global batch size:    16 | lm loss: 6.729958E+00 | loss scale: 16384.0 | grad norm: 93778.013 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2823/  159576 | consumed samples:        45168 | elapsed time per iteration (ms): 13650.0 | learning rate: 1.251E-05 | global batch size:    16 | lm loss: 6.480144E+00 | loss scale: 16384.0 | grad norm: 60246.483 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2824/  159576 | consumed samples:        45184 | elapsed time per iteration (ms): 13511.5 | learning rate: 1.251E-05 | global batch size:    16 | lm loss: 6.595847E+00 | loss scale: 16384.0 | grad norm: 63557.308 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2825/  159576 | consumed samples:        45200 | elapsed time per iteration (ms): 13655.5 | learning rate: 1.252E-05 | global batch size:    16 | lm loss: 6.689149E+00 | loss scale: 16384.0 | grad norm: 67372.582 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2826/  159576 | consumed samples:        45216 | elapsed time per iteration (ms): 13638.0 | learning rate: 1.252E-05 | global batch size:    16 | lm loss: 6.689507E+00 | loss scale: 16384.0 | grad norm: 69124.069 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2827/  159576 | consumed samples:        45232 | elapsed time per iteration (ms): 13546.1 | learning rate: 1.253E-05 | global batch size:    16 | lm loss: 6.457958E+00 | loss scale: 16384.0 | grad norm: 56160.018 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2828/  159576 | consumed samples:        45248 | elapsed time per iteration (ms): 13610.9 | learning rate: 1.253E-05 | global batch size:    16 | lm loss: 6.815155E+00 | loss scale: 16384.0 | grad norm: 61009.082 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2829/  159576 | consumed samples:        45264 | elapsed time per iteration (ms): 13930.1 | learning rate: 1.254E-05 | global batch size:    16 | lm loss: 6.595886E+00 | loss scale: 16384.0 | grad norm: 45906.579 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2830/  159576 | consumed samples:        45280 | elapsed time per iteration (ms): 13608.1 | learning rate: 1.254E-05 | global batch size:    16 | lm loss: 6.642846E+00 | loss scale: 16384.0 | grad norm: 74796.317 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2831/  159576 | consumed samples:        45296 | elapsed time per iteration (ms): 13539.5 | learning rate: 1.255E-05 | global batch size:    16 | lm loss: 6.810493E+00 | loss scale: 16384.0 | grad norm: 64536.090 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2832/  159576 | consumed samples:        45312 | elapsed time per iteration (ms): 13571.9 | learning rate: 1.255E-05 | global batch size:    16 | lm loss: 6.742997E+00 | loss scale: 16384.0 | grad norm: 54697.232 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2833/  159576 | consumed samples:        45328 | elapsed time per iteration (ms): 13626.3 | learning rate: 1.255E-05 | global batch size:    16 | lm loss: 6.734198E+00 | loss scale: 16384.0 | grad norm: 124253.673 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2834/  159576 | consumed samples:        45344 | elapsed time per iteration (ms): 13905.8 | learning rate: 1.256E-05 | global batch size:    16 | lm loss: 6.652834E+00 | loss scale: 16384.0 | grad norm: 71659.235 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-24 13:06:39] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1165978_[1-10%1] on 'gpu_p13' partition)
[2021-09-24 13:06:39] PULSE: tr8-104B is running for 7:14:28 since 2021-09-24T05:52:11 (1162855_1 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8])
 iteration     2835/  159576 | consumed samples:        45360 | elapsed time per iteration (ms): 13609.5 | learning rate: 1.256E-05 | global batch size:    16 | lm loss: 6.789959E+00 | loss scale: 16384.0 | grad norm: 73488.360 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2836/  159576 | consumed samples:        45376 | elapsed time per iteration (ms): 13614.7 | learning rate: 1.257E-05 | global batch size:    16 | lm loss: 6.695529E+00 | loss scale: 16384.0 | grad norm: 69307.839 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2837/  159576 | consumed samples:        45392 | elapsed time per iteration (ms): 13634.1 | learning rate: 1.257E-05 | global batch size:    16 | lm loss: 6.550642E+00 | loss scale: 16384.0 | grad norm: 88157.717 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2838/  159576 | consumed samples:        45408 | elapsed time per iteration (ms): 14029.3 | learning rate: 1.258E-05 | global batch size:    16 | lm loss: 6.745864E+00 | loss scale: 16384.0 | grad norm: 79032.308 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2839/  159576 | consumed samples:        45424 | elapsed time per iteration (ms): 13631.7 | learning rate: 1.258E-05 | global batch size:    16 | lm loss: 7.013217E+00 | loss scale: 16384.0 | grad norm: 90598.683 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2840/  159576 | consumed samples:        45440 | elapsed time per iteration (ms): 13552.2 | learning rate: 1.259E-05 | global batch size:    16 | lm loss: 6.791473E+00 | loss scale: 16384.0 | grad norm: 66761.526 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2841/  159576 | consumed samples:        45456 | elapsed time per iteration (ms): 13585.4 | learning rate: 1.259E-05 | global batch size:    16 | lm loss: 6.639102E+00 | loss scale: 16384.0 | grad norm: 75945.826 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2842/  159576 | consumed samples:        45472 | elapsed time per iteration (ms): 14005.5 | learning rate: 1.259E-05 | global batch size:    16 | lm loss: 6.750570E+00 | loss scale: 16384.0 | grad norm: 52422.045 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2843/  159576 | consumed samples:        45488 | elapsed time per iteration (ms): 13637.6 | learning rate: 1.260E-05 | global batch size:    16 | lm loss: 6.761233E+00 | loss scale: 16384.0 | grad norm: 96201.801 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2844/  159576 | consumed samples:        45504 | elapsed time per iteration (ms): 13605.0 | learning rate: 1.260E-05 | global batch size:    16 | lm loss: 6.869712E+00 | loss scale: 16384.0 | grad norm: 85259.060 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2845/  159576 | consumed samples:        45520 | elapsed time per iteration (ms): 13489.6 | learning rate: 1.261E-05 | global batch size:    16 | lm loss: 6.754227E+00 | loss scale: 16384.0 | grad norm: 71430.457 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2846/  159576 | consumed samples:        45536 | elapsed time per iteration (ms): 13633.0 | learning rate: 1.261E-05 | global batch size:    16 | lm loss: 6.681328E+00 | loss scale: 16384.0 | grad norm: 64498.648 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2847/  159576 | consumed samples:        45552 | elapsed time per iteration (ms): 13680.5 | learning rate: 1.262E-05 | global batch size:    16 | lm loss: 6.708944E+00 | loss scale: 16384.0 | grad norm: 99300.512 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2848/  159576 | consumed samples:        45568 | elapsed time per iteration (ms): 13578.9 | learning rate: 1.262E-05 | global batch size:    16 | lm loss: 6.689048E+00 | loss scale: 16384.0 | grad norm: 90482.932 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2849/  159576 | consumed samples:        45584 | elapsed time per iteration (ms): 13613.6 | learning rate: 1.263E-05 | global batch size:    16 | lm loss: 6.673044E+00 | loss scale: 16384.0 | grad norm: 59461.342 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2850/  159576 | consumed samples:        45600 | elapsed time per iteration (ms): 13675.0 | learning rate: 1.263E-05 | global batch size:    16 | lm loss: 6.738005E+00 | loss scale: 16384.0 | grad norm: 101125.933 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2851/  159576 | consumed samples:        45616 | elapsed time per iteration (ms): 13897.5 | learning rate: 1.263E-05 | global batch size:    16 | lm loss: 6.522173E+00 | loss scale: 16384.0 | grad norm: 90321.174 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2852/  159576 | consumed samples:        45632 | elapsed time per iteration (ms): 13599.3 | learning rate: 1.264E-05 | global batch size:    16 | lm loss: 6.524035E+00 | loss scale: 16384.0 | grad norm: 70117.709 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2853/  159576 | consumed samples:        45648 | elapsed time per iteration (ms): 13643.7 | learning rate: 1.264E-05 | global batch size:    16 | lm loss: 6.510409E+00 | loss scale: 16384.0 | grad norm: 64993.085 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2854/  159576 | consumed samples:        45664 | elapsed time per iteration (ms): 13552.1 | learning rate: 1.265E-05 | global batch size:    16 | lm loss: 6.913634E+00 | loss scale: 16384.0 | grad norm: 106101.567 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2855/  159576 | consumed samples:        45680 | elapsed time per iteration (ms): 13759.3 | learning rate: 1.265E-05 | global batch size:    16 | lm loss: 6.640407E+00 | loss scale: 16384.0 | grad norm: 114581.301 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2856/  159576 | consumed samples:        45696 | elapsed time per iteration (ms): 13808.3 | learning rate: 1.266E-05 | global batch size:    16 | lm loss: 6.781041E+00 | loss scale: 16384.0 | grad norm: 56604.166 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2857/  159576 | consumed samples:        45712 | elapsed time per iteration (ms): 13620.2 | learning rate: 1.266E-05 | global batch size:    16 | lm loss: 6.794811E+00 | loss scale: 16384.0 | grad norm: 60150.039 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2858/  159576 | consumed samples:        45728 | elapsed time per iteration (ms): 13675.9 | learning rate: 1.267E-05 | global batch size:    16 | lm loss: 6.586791E+00 | loss scale: 16384.0 | grad norm: 100786.813 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2859/  159576 | consumed samples:        45744 | elapsed time per iteration (ms): 13583.4 | learning rate: 1.267E-05 | global batch size:    16 | lm loss: 6.762810E+00 | loss scale: 16384.0 | grad norm: 82968.484 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2860/  159576 | consumed samples:        45760 | elapsed time per iteration (ms): 13906.7 | learning rate: 1.267E-05 | global batch size:    16 | lm loss: 6.739496E+00 | loss scale: 16384.0 | grad norm: 51306.674 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2861/  159576 | consumed samples:        45776 | elapsed time per iteration (ms): 13619.1 | learning rate: 1.268E-05 | global batch size:    16 | lm loss: 6.046006E+00 | loss scale: 16384.0 | grad norm: 70726.137 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2862/  159576 | consumed samples:        45792 | elapsed time per iteration (ms): 13544.2 | learning rate: 1.268E-05 | global batch size:    16 | lm loss: 6.803837E+00 | loss scale: 16384.0 | grad norm: 68740.252 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2863/  159576 | consumed samples:        45808 | elapsed time per iteration (ms): 13610.8 | learning rate: 1.269E-05 | global batch size:    16 | lm loss: 6.770112E+00 | loss scale: 16384.0 | grad norm: 139814.235 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2864/  159576 | consumed samples:        45824 | elapsed time per iteration (ms): 13958.0 | learning rate: 1.269E-05 | global batch size:    16 | lm loss: 6.750904E+00 | loss scale: 16384.0 | grad norm: 77621.986 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2865/  159576 | consumed samples:        45840 | elapsed time per iteration (ms): 13670.7 | learning rate: 1.270E-05 | global batch size:    16 | lm loss: 6.696413E+00 | loss scale: 16384.0 | grad norm: 71170.214 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2866/  159576 | consumed samples:        45856 | elapsed time per iteration (ms): 13638.6 | learning rate: 1.270E-05 | global batch size:    16 | lm loss: 6.704915E+00 | loss scale: 16384.0 | grad norm: 101640.576 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2867/  159576 | consumed samples:        45872 | elapsed time per iteration (ms): 13607.2 | learning rate: 1.271E-05 | global batch size:    16 | lm loss: 6.825719E+00 | loss scale: 16384.0 | grad norm: 75740.165 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2868/  159576 | consumed samples:        45888 | elapsed time per iteration (ms): 13630.4 | learning rate: 1.271E-05 | global batch size:    16 | lm loss: 6.287379E+00 | loss scale: 16384.0 | grad norm: 102389.724 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2869/  159576 | consumed samples:        45904 | elapsed time per iteration (ms): 13745.4 | learning rate: 1.271E-05 | global batch size:    16 | lm loss: 6.541815E+00 | loss scale: 16384.0 | grad norm: 70149.993 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2870/  159576 | consumed samples:        45920 | elapsed time per iteration (ms): 13607.8 | learning rate: 1.272E-05 | global batch size:    16 | lm loss: 6.516257E+00 | loss scale: 16384.0 | grad norm: 75996.012 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2871/  159576 | consumed samples:        45936 | elapsed time per iteration (ms): 13612.1 | learning rate: 1.272E-05 | global batch size:    16 | lm loss: 6.478125E+00 | loss scale: 16384.0 | grad norm: 71923.342 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2872/  159576 | consumed samples:        45952 | elapsed time per iteration (ms): 13608.0 | learning rate: 1.273E-05 | global batch size:    16 | lm loss: 6.691109E+00 | loss scale: 16384.0 | grad norm: 87426.398 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2873/  159576 | consumed samples:        45968 | elapsed time per iteration (ms): 13976.7 | learning rate: 1.273E-05 | global batch size:    16 | lm loss: 6.620930E+00 | loss scale: 16384.0 | grad norm: 104041.099 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2874/  159576 | consumed samples:        45984 | elapsed time per iteration (ms): 13607.9 | learning rate: 1.274E-05 | global batch size:    16 | lm loss: 6.744573E+00 | loss scale: 16384.0 | grad norm: 69927.423 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2875/  159576 | consumed samples:        46000 | elapsed time per iteration (ms): 13661.2 | learning rate: 1.274E-05 | global batch size:    16 | lm loss: 6.676423E+00 | loss scale: 16384.0 | grad norm: 51002.581 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2876/  159576 | consumed samples:        46016 | elapsed time per iteration (ms): 13531.2 | learning rate: 1.275E-05 | global batch size:    16 | lm loss: 6.802640E+00 | loss scale: 16384.0 | grad norm: 87004.969 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2877/  159576 | consumed samples:        46032 | elapsed time per iteration (ms): 13901.7 | learning rate: 1.275E-05 | global batch size:    16 | lm loss: 6.729659E+00 | loss scale: 16384.0 | grad norm: 50767.745 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2878/  159576 | consumed samples:        46048 | elapsed time per iteration (ms): 13702.1 | learning rate: 1.275E-05 | global batch size:    16 | lm loss: 6.922673E+00 | loss scale: 16384.0 | grad norm: 121433.743 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2879/  159576 | consumed samples:        46064 | elapsed time per iteration (ms): 13605.9 | learning rate: 1.276E-05 | global batch size:    16 | lm loss: 6.701990E+00 | loss scale: 16384.0 | grad norm: 78796.373 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2880/  159576 | consumed samples:        46080 | elapsed time per iteration (ms): 13615.6 | learning rate: 1.276E-05 | global batch size:    16 | lm loss: 6.650718E+00 | loss scale: 16384.0 | grad norm: 68193.269 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2881/  159576 | consumed samples:        46096 | elapsed time per iteration (ms): 13595.5 | learning rate: 1.277E-05 | global batch size:    16 | lm loss: 6.732479E+00 | loss scale: 16384.0 | grad norm: 69049.929 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2882/  159576 | consumed samples:        46112 | elapsed time per iteration (ms): 13888.6 | learning rate: 1.277E-05 | global batch size:    16 | lm loss: 6.563155E+00 | loss scale: 16384.0 | grad norm: 84383.583 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2883/  159576 | consumed samples:        46128 | elapsed time per iteration (ms): 13560.8 | learning rate: 1.278E-05 | global batch size:    16 | lm loss: 6.406487E+00 | loss scale: 16384.0 | grad norm: 66632.669 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2884/  159576 | consumed samples:        46144 | elapsed time per iteration (ms): 13502.0 | learning rate: 1.278E-05 | global batch size:    16 | lm loss: 6.748409E+00 | loss scale: 16384.0 | grad norm: 69626.540 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2885/  159576 | consumed samples:        46160 | elapsed time per iteration (ms): 13526.3 | learning rate: 1.279E-05 | global batch size:    16 | lm loss: 6.474768E+00 | loss scale: 16384.0 | grad norm: 43811.525 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2886/  159576 | consumed samples:        46176 | elapsed time per iteration (ms): 13863.4 | learning rate: 1.279E-05 | global batch size:    16 | lm loss: 6.661960E+00 | loss scale: 16384.0 | grad norm: 71612.680 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2887/  159576 | consumed samples:        46192 | elapsed time per iteration (ms): 13578.7 | learning rate: 1.279E-05 | global batch size:    16 | lm loss: 6.511534E+00 | loss scale: 16384.0 | grad norm: 60456.127 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2888/  159576 | consumed samples:        46208 | elapsed time per iteration (ms): 13588.8 | learning rate: 1.280E-05 | global batch size:    16 | lm loss: 6.689698E+00 | loss scale: 16384.0 | grad norm: 101410.640 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2889/  159576 | consumed samples:        46224 | elapsed time per iteration (ms): 13621.2 | learning rate: 1.280E-05 | global batch size:    16 | lm loss: 6.679986E+00 | loss scale: 16384.0 | grad norm: 74313.427 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2890/  159576 | consumed samples:        46240 | elapsed time per iteration (ms): 13599.6 | learning rate: 1.281E-05 | global batch size:    16 | lm loss: 6.579202E+00 | loss scale: 16384.0 | grad norm: 53116.582 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2891/  159576 | consumed samples:        46256 | elapsed time per iteration (ms): 13965.8 | learning rate: 1.281E-05 | global batch size:    16 | lm loss: 6.841757E+00 | loss scale: 16384.0 | grad norm: 71980.947 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2892/  159576 | consumed samples:        46272 | elapsed time per iteration (ms): 13517.0 | learning rate: 1.282E-05 | global batch size:    16 | lm loss: 6.555973E+00 | loss scale: 16384.0 | grad norm: 90572.552 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2893/  159576 | consumed samples:        46288 | elapsed time per iteration (ms): 13525.5 | learning rate: 1.282E-05 | global batch size:    16 | lm loss: 6.857316E+00 | loss scale: 16384.0 | grad norm: 60488.506 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2894/  159576 | consumed samples:        46304 | elapsed time per iteration (ms): 13541.9 | learning rate: 1.283E-05 | global batch size:    16 | lm loss: 6.685534E+00 | loss scale: 16384.0 | grad norm: 69134.968 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2895/  159576 | consumed samples:        46320 | elapsed time per iteration (ms): 14148.5 | learning rate: 1.283E-05 | global batch size:    16 | lm loss: 6.805571E+00 | loss scale: 16384.0 | grad norm: 57858.457 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2896/  159576 | consumed samples:        46336 | elapsed time per iteration (ms): 13614.8 | learning rate: 1.283E-05 | global batch size:    16 | lm loss: 6.839938E+00 | loss scale: 16384.0 | grad norm: 146916.459 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2897/  159576 | consumed samples:        46352 | elapsed time per iteration (ms): 13601.5 | learning rate: 1.284E-05 | global batch size:    16 | lm loss: 6.725083E+00 | loss scale: 16384.0 | grad norm: 101921.781 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2898/  159576 | consumed samples:        46368 | elapsed time per iteration (ms): 13584.0 | learning rate: 1.284E-05 | global batch size:    16 | lm loss: 7.088351E+00 | loss scale: 16384.0 | grad norm: 78883.090 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2899/  159576 | consumed samples:        46384 | elapsed time per iteration (ms): 14019.6 | learning rate: 1.285E-05 | global batch size:    16 | lm loss: 6.874489E+00 | loss scale: 16384.0 | grad norm: 79406.253 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2900/  159576 | consumed samples:        46400 | elapsed time per iteration (ms): 13571.5 | learning rate: 1.285E-05 | global batch size:    16 | lm loss: 6.735637E+00 | loss scale: 16384.0 | grad norm: 58170.467 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2901/  159576 | consumed samples:        46416 | elapsed time per iteration (ms): 13559.8 | learning rate: 1.286E-05 | global batch size:    16 | lm loss: 6.789194E+00 | loss scale: 16384.0 | grad norm: 153130.501 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2902/  159576 | consumed samples:        46432 | elapsed time per iteration (ms): 13570.5 | learning rate: 1.286E-05 | global batch size:    16 | lm loss: 6.734316E+00 | loss scale: 16384.0 | grad norm: 116070.395 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2903/  159576 | consumed samples:        46448 | elapsed time per iteration (ms): 13629.7 | learning rate: 1.287E-05 | global batch size:    16 | lm loss: 6.743185E+00 | loss scale: 16384.0 | grad norm: 76970.593 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2904/  159576 | consumed samples:        46464 | elapsed time per iteration (ms): 13980.9 | learning rate: 1.287E-05 | global batch size:    16 | lm loss: 6.742231E+00 | loss scale: 16384.0 | grad norm: 79904.122 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2905/  159576 | consumed samples:        46480 | elapsed time per iteration (ms): 13647.6 | learning rate: 1.287E-05 | global batch size:    16 | lm loss: 6.785865E+00 | loss scale: 16384.0 | grad norm: 66541.967 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2906/  159576 | consumed samples:        46496 | elapsed time per iteration (ms): 13586.1 | learning rate: 1.288E-05 | global batch size:    16 | lm loss: 6.669911E+00 | loss scale: 16384.0 | grad norm: 76560.935 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2907/  159576 | consumed samples:        46512 | elapsed time per iteration (ms): 13521.3 | learning rate: 1.288E-05 | global batch size:    16 | lm loss: 6.723244E+00 | loss scale: 16384.0 | grad norm: 103466.024 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2908/  159576 | consumed samples:        46528 | elapsed time per iteration (ms): 13824.4 | learning rate: 1.289E-05 | global batch size:    16 | lm loss: 6.584032E+00 | loss scale: 16384.0 | grad norm: 73252.234 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2909/  159576 | consumed samples:        46544 | elapsed time per iteration (ms): 13578.9 | learning rate: 1.289E-05 | global batch size:    16 | lm loss: 6.804316E+00 | loss scale: 16384.0 | grad norm: 70073.019 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2910/  159576 | consumed samples:        46560 | elapsed time per iteration (ms): 13556.4 | learning rate: 1.290E-05 | global batch size:    16 | lm loss: 6.673604E+00 | loss scale: 16384.0 | grad norm: 109090.622 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2911/  159576 | consumed samples:        46576 | elapsed time per iteration (ms): 13604.0 | learning rate: 1.290E-05 | global batch size:    16 | lm loss: 6.599095E+00 | loss scale: 16384.0 | grad norm: 57781.446 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2912/  159576 | consumed samples:        46592 | elapsed time per iteration (ms): 13587.1 | learning rate: 1.291E-05 | global batch size:    16 | lm loss: 6.753370E+00 | loss scale: 16384.0 | grad norm: 76832.152 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2913/  159576 | consumed samples:        46608 | elapsed time per iteration (ms): 13861.5 | learning rate: 1.291E-05 | global batch size:    16 | lm loss: 6.854298E+00 | loss scale: 16384.0 | grad norm: 72132.986 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2914/  159576 | consumed samples:        46624 | elapsed time per iteration (ms): 13559.0 | learning rate: 1.291E-05 | global batch size:    16 | lm loss: 6.579864E+00 | loss scale: 16384.0 | grad norm: 74308.017 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2915/  159576 | consumed samples:        46640 | elapsed time per iteration (ms): 13594.5 | learning rate: 1.292E-05 | global batch size:    16 | lm loss: 6.756865E+00 | loss scale: 16384.0 | grad norm: 54456.279 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2916/  159576 | consumed samples:        46656 | elapsed time per iteration (ms): 13569.5 | learning rate: 1.292E-05 | global batch size:    16 | lm loss: 6.743901E+00 | loss scale: 16384.0 | grad norm: 55395.902 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2917/  159576 | consumed samples:        46672 | elapsed time per iteration (ms): 13964.6 | learning rate: 1.293E-05 | global batch size:    16 | lm loss: 6.671132E+00 | loss scale: 16384.0 | grad norm: 82925.647 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2918/  159576 | consumed samples:        46688 | elapsed time per iteration (ms): 13641.5 | learning rate: 1.293E-05 | global batch size:    16 | lm loss: 6.554927E+00 | loss scale: 16384.0 | grad norm: 64164.151 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2919/  159576 | consumed samples:        46704 | elapsed time per iteration (ms): 13635.2 | learning rate: 1.294E-05 | global batch size:    16 | lm loss: 6.848719E+00 | loss scale: 16384.0 | grad norm: 67718.918 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2920/  159576 | consumed samples:        46720 | elapsed time per iteration (ms): 13603.6 | learning rate: 1.294E-05 | global batch size:    16 | lm loss: 6.609835E+00 | loss scale: 16384.0 | grad norm: 64921.543 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2921/  159576 | consumed samples:        46736 | elapsed time per iteration (ms): 13865.5 | learning rate: 1.295E-05 | global batch size:    16 | lm loss: 6.699195E+00 | loss scale: 16384.0 | grad norm: 76865.088 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2922/  159576 | consumed samples:        46752 | elapsed time per iteration (ms): 13659.4 | learning rate: 1.295E-05 | global batch size:    16 | lm loss: 6.821632E+00 | loss scale: 16384.0 | grad norm: 105825.800 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2923/  159576 | consumed samples:        46768 | elapsed time per iteration (ms): 13539.7 | learning rate: 1.295E-05 | global batch size:    16 | lm loss: 6.632296E+00 | loss scale: 16384.0 | grad norm: 85548.400 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2924/  159576 | consumed samples:        46784 | elapsed time per iteration (ms): 13587.6 | learning rate: 1.296E-05 | global batch size:    16 | lm loss: 6.782111E+00 | loss scale: 16384.0 | grad norm: 64005.847 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2925/  159576 | consumed samples:        46800 | elapsed time per iteration (ms): 13566.6 | learning rate: 1.296E-05 | global batch size:    16 | lm loss: 6.513734E+00 | loss scale: 16384.0 | grad norm: 74875.410 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2926/  159576 | consumed samples:        46816 | elapsed time per iteration (ms): 13817.4 | learning rate: 1.297E-05 | global batch size:    16 | lm loss: 6.610899E+00 | loss scale: 16384.0 | grad norm: 69678.116 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2927/  159576 | consumed samples:        46832 | elapsed time per iteration (ms): 13615.5 | learning rate: 1.297E-05 | global batch size:    16 | lm loss: 7.086233E+00 | loss scale: 16384.0 | grad norm: 70522.964 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2928/  159576 | consumed samples:        46848 | elapsed time per iteration (ms): 13566.8 | learning rate: 1.298E-05 | global batch size:    16 | lm loss: 6.598146E+00 | loss scale: 16384.0 | grad norm: 103276.135 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2929/  159576 | consumed samples:        46864 | elapsed time per iteration (ms): 13567.1 | learning rate: 1.298E-05 | global batch size:    16 | lm loss: 6.593244E+00 | loss scale: 16384.0 | grad norm: 78523.778 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2930/  159576 | consumed samples:        46880 | elapsed time per iteration (ms): 13919.4 | learning rate: 1.299E-05 | global batch size:    16 | lm loss: 6.528622E+00 | loss scale: 16384.0 | grad norm: 82737.988 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2931/  159576 | consumed samples:        46896 | elapsed time per iteration (ms): 13557.6 | learning rate: 1.299E-05 | global batch size:    16 | lm loss: 6.605000E+00 | loss scale: 16384.0 | grad norm: 68077.419 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2932/  159576 | consumed samples:        46912 | elapsed time per iteration (ms): 13570.1 | learning rate: 1.299E-05 | global batch size:    16 | lm loss: 6.595417E+00 | loss scale: 16384.0 | grad norm: 84602.521 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2933/  159576 | consumed samples:        46928 | elapsed time per iteration (ms): 13606.8 | learning rate: 1.300E-05 | global batch size:    16 | lm loss: 6.730010E+00 | loss scale: 16384.0 | grad norm: 85745.847 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2934/  159576 | consumed samples:        46944 | elapsed time per iteration (ms): 13584.8 | learning rate: 1.300E-05 | global batch size:    16 | lm loss: 6.689770E+00 | loss scale: 16384.0 | grad norm: 62655.073 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2935/  159576 | consumed samples:        46960 | elapsed time per iteration (ms): 14053.4 | learning rate: 1.301E-05 | global batch size:    16 | lm loss: 6.715128E+00 | loss scale: 16384.0 | grad norm: 65695.924 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2936/  159576 | consumed samples:        46976 | elapsed time per iteration (ms): 13589.9 | learning rate: 1.301E-05 | global batch size:    16 | lm loss: 6.651369E+00 | loss scale: 16384.0 | grad norm: 55322.170 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2937/  159576 | consumed samples:        46992 | elapsed time per iteration (ms): 13553.6 | learning rate: 1.302E-05 | global batch size:    16 | lm loss: 6.646598E+00 | loss scale: 16384.0 | grad norm: 105686.832 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2938/  159576 | consumed samples:        47008 | elapsed time per iteration (ms): 13584.5 | learning rate: 1.302E-05 | global batch size:    16 | lm loss: 6.798124E+00 | loss scale: 16384.0 | grad norm: 62478.433 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2939/  159576 | consumed samples:        47024 | elapsed time per iteration (ms): 13902.5 | learning rate: 1.303E-05 | global batch size:    16 | lm loss: 6.594469E+00 | loss scale: 16384.0 | grad norm: 66128.402 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2940/  159576 | consumed samples:        47040 | elapsed time per iteration (ms): 13632.4 | learning rate: 1.303E-05 | global batch size:    16 | lm loss: 6.642596E+00 | loss scale: 16384.0 | grad norm: 70291.389 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2941/  159576 | consumed samples:        47056 | elapsed time per iteration (ms): 13595.9 | learning rate: 1.303E-05 | global batch size:    16 | lm loss: 6.428228E+00 | loss scale: 16384.0 | grad norm: 88273.080 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2942/  159576 | consumed samples:        47072 | elapsed time per iteration (ms): 13622.0 | learning rate: 1.304E-05 | global batch size:    16 | lm loss: 6.776118E+00 | loss scale: 16384.0 | grad norm: 66140.535 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2943/  159576 | consumed samples:        47088 | elapsed time per iteration (ms): 13949.2 | learning rate: 1.304E-05 | global batch size:    16 | lm loss: 6.678353E+00 | loss scale: 16384.0 | grad norm: 68411.066 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2944/  159576 | consumed samples:        47104 | elapsed time per iteration (ms): 13581.2 | learning rate: 1.305E-05 | global batch size:    16 | lm loss: 6.679141E+00 | loss scale: 16384.0 | grad norm: 85622.880 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2945/  159576 | consumed samples:        47120 | elapsed time per iteration (ms): 13544.3 | learning rate: 1.305E-05 | global batch size:    16 | lm loss: 6.620451E+00 | loss scale: 16384.0 | grad norm: 62226.279 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2946/  159576 | consumed samples:        47136 | elapsed time per iteration (ms): 13593.9 | learning rate: 1.306E-05 | global batch size:    16 | lm loss: 6.719603E+00 | loss scale: 16384.0 | grad norm: 90885.483 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2947/  159576 | consumed samples:        47152 | elapsed time per iteration (ms): 13604.3 | learning rate: 1.306E-05 | global batch size:    16 | lm loss: 6.704114E+00 | loss scale: 16384.0 | grad norm: 67182.295 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2948/  159576 | consumed samples:        47168 | elapsed time per iteration (ms): 13746.5 | learning rate: 1.307E-05 | global batch size:    16 | lm loss: 6.781267E+00 | loss scale: 16384.0 | grad norm: 85616.234 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2949/  159576 | consumed samples:        47184 | elapsed time per iteration (ms): 13612.1 | learning rate: 1.307E-05 | global batch size:    16 | lm loss: 6.878286E+00 | loss scale: 16384.0 | grad norm: 83807.291 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2950/  159576 | consumed samples:        47200 | elapsed time per iteration (ms): 13656.8 | learning rate: 1.307E-05 | global batch size:    16 | lm loss: 6.808831E+00 | loss scale: 16384.0 | grad norm: 99669.258 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2951/  159576 | consumed samples:        47216 | elapsed time per iteration (ms): 13662.4 | learning rate: 1.308E-05 | global batch size:    16 | lm loss: 6.751644E+00 | loss scale: 16384.0 | grad norm: 60477.798 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2952/  159576 | consumed samples:        47232 | elapsed time per iteration (ms): 13999.0 | learning rate: 1.308E-05 | global batch size:    16 | lm loss: 6.593210E+00 | loss scale: 16384.0 | grad norm: 72293.070 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2953/  159576 | consumed samples:        47248 | elapsed time per iteration (ms): 13609.1 | learning rate: 1.309E-05 | global batch size:    16 | lm loss: 6.662547E+00 | loss scale: 16384.0 | grad norm: 49910.061 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2954/  159576 | consumed samples:        47280 | elapsed time per iteration (ms): 14635.0 | learning rate: 1.310E-05 | global batch size:    32 | lm loss: 6.688079E+00 | loss scale: 16384.0 | grad norm: 111598.336 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2955/  159576 | consumed samples:        47312 | elapsed time per iteration (ms): 14591.8 | learning rate: 1.311E-05 | global batch size:    32 | lm loss: 6.657289E+00 | loss scale: 16384.0 | grad norm: 67597.793 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2956/  159576 | consumed samples:        47344 | elapsed time per iteration (ms): 15030.0 | learning rate: 1.311E-05 | global batch size:    32 | lm loss: 6.554570E+00 | loss scale: 16384.0 | grad norm: 69780.387 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2957/  159576 | consumed samples:        47376 | elapsed time per iteration (ms): 14563.7 | learning rate: 1.312E-05 | global batch size:    32 | lm loss: 6.741304E+00 | loss scale: 16384.0 | grad norm: 58633.422 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2958/  159576 | consumed samples:        47408 | elapsed time per iteration (ms): 14589.9 | learning rate: 1.313E-05 | global batch size:    32 | lm loss: 6.601515E+00 | loss scale: 16384.0 | grad norm: 107295.423 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2959/  159576 | consumed samples:        47440 | elapsed time per iteration (ms): 14625.1 | learning rate: 1.314E-05 | global batch size:    32 | lm loss: 6.683945E+00 | loss scale: 16384.0 | grad norm: 81347.650 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2960/  159576 | consumed samples:        47472 | elapsed time per iteration (ms): 14964.2 | learning rate: 1.315E-05 | global batch size:    32 | lm loss: 6.790781E+00 | loss scale: 16384.0 | grad norm: 77191.821 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2961/  159576 | consumed samples:        47504 | elapsed time per iteration (ms): 14557.0 | learning rate: 1.316E-05 | global batch size:    32 | lm loss: 6.749201E+00 | loss scale: 16384.0 | grad norm: 82408.963 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2962/  159576 | consumed samples:        47536 | elapsed time per iteration (ms): 14666.5 | learning rate: 1.317E-05 | global batch size:    32 | lm loss: 6.532114E+00 | loss scale: 16384.0 | grad norm: 51870.305 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2963/  159576 | consumed samples:        47568 | elapsed time per iteration (ms): 14537.9 | learning rate: 1.318E-05 | global batch size:    32 | lm loss: 6.660976E+00 | loss scale: 16384.0 | grad norm: 66392.838 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2964/  159576 | consumed samples:        47600 | elapsed time per iteration (ms): 15078.8 | learning rate: 1.318E-05 | global batch size:    32 | lm loss: 6.526144E+00 | loss scale: 16384.0 | grad norm: 54716.550 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2965/  159576 | consumed samples:        47632 | elapsed time per iteration (ms): 14737.9 | learning rate: 1.319E-05 | global batch size:    32 | lm loss: 6.649373E+00 | loss scale: 16384.0 | grad norm: 51359.795 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2966/  159576 | consumed samples:        47664 | elapsed time per iteration (ms): 14559.9 | learning rate: 1.320E-05 | global batch size:    32 | lm loss: 6.672748E+00 | loss scale: 16384.0 | grad norm: 73789.982 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2967/  159576 | consumed samples:        47696 | elapsed time per iteration (ms): 14642.3 | learning rate: 1.321E-05 | global batch size:    32 | lm loss: 6.662704E+00 | loss scale: 16384.0 | grad norm: 66303.151 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2968/  159576 | consumed samples:        47728 | elapsed time per iteration (ms): 14852.7 | learning rate: 1.322E-05 | global batch size:    32 | lm loss: 6.624488E+00 | loss scale: 16384.0 | grad norm: 59052.609 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2969/  159576 | consumed samples:        47760 | elapsed time per iteration (ms): 14836.6 | learning rate: 1.323E-05 | global batch size:    32 | lm loss: 6.600084E+00 | loss scale: 16384.0 | grad norm: 62547.253 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2970/  159576 | consumed samples:        47792 | elapsed time per iteration (ms): 14593.7 | learning rate: 1.324E-05 | global batch size:    32 | lm loss: 6.517389E+00 | loss scale: 16384.0 | grad norm: 60694.546 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2971/  159576 | consumed samples:        47824 | elapsed time per iteration (ms): 14618.4 | learning rate: 1.325E-05 | global batch size:    32 | lm loss: 6.548014E+00 | loss scale: 16384.0 | grad norm: 43913.010 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2972/  159576 | consumed samples:        47856 | elapsed time per iteration (ms): 14695.6 | learning rate: 1.326E-05 | global batch size:    32 | lm loss: 6.593935E+00 | loss scale: 16384.0 | grad norm: 63488.321 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2973/  159576 | consumed samples:        47888 | elapsed time per iteration (ms): 14827.1 | learning rate: 1.326E-05 | global batch size:    32 | lm loss: 6.572222E+00 | loss scale: 16384.0 | grad norm: 54368.202 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2974/  159576 | consumed samples:        47920 | elapsed time per iteration (ms): 14620.6 | learning rate: 1.327E-05 | global batch size:    32 | lm loss: 6.550548E+00 | loss scale: 16384.0 | grad norm: 87940.074 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2975/  159576 | consumed samples:        47952 | elapsed time per iteration (ms): 14622.4 | learning rate: 1.328E-05 | global batch size:    32 | lm loss: 6.529421E+00 | loss scale: 16384.0 | grad norm: 60145.649 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2976/  159576 | consumed samples:        47984 | elapsed time per iteration (ms): 14586.4 | learning rate: 1.329E-05 | global batch size:    32 | lm loss: 6.765855E+00 | loss scale: 16384.0 | grad norm: 83899.803 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2977/  159576 | consumed samples:        48016 | elapsed time per iteration (ms): 14810.9 | learning rate: 1.330E-05 | global batch size:    32 | lm loss: 6.630699E+00 | loss scale: 16384.0 | grad norm: 44149.072 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2978/  159576 | consumed samples:        48048 | elapsed time per iteration (ms): 14685.4 | learning rate: 1.331E-05 | global batch size:    32 | lm loss: 6.561995E+00 | loss scale: 16384.0 | grad norm: 87446.523 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2979/  159576 | consumed samples:        48080 | elapsed time per iteration (ms): 14648.9 | learning rate: 1.332E-05 | global batch size:    32 | lm loss: 6.467924E+00 | loss scale: 16384.0 | grad norm: 65034.519 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2980/  159576 | consumed samples:        48112 | elapsed time per iteration (ms): 14615.3 | learning rate: 1.333E-05 | global batch size:    32 | lm loss: 6.649030E+00 | loss scale: 16384.0 | grad norm: 92148.403 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2981/  159576 | consumed samples:        48144 | elapsed time per iteration (ms): 14681.7 | learning rate: 1.334E-05 | global batch size:    32 | lm loss: 6.749784E+00 | loss scale: 16384.0 | grad norm: 61670.963 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2982/  159576 | consumed samples:        48176 | elapsed time per iteration (ms): 14509.6 | learning rate: 1.334E-05 | global batch size:    32 | lm loss: 6.567672E+00 | loss scale: 16384.0 | grad norm: 79628.022 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2983/  159576 | consumed samples:        48208 | elapsed time per iteration (ms): 14555.2 | learning rate: 1.335E-05 | global batch size:    32 | lm loss: 6.676024E+00 | loss scale: 16384.0 | grad norm: 65136.709 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2984/  159576 | consumed samples:        48240 | elapsed time per iteration (ms): 14572.2 | learning rate: 1.336E-05 | global batch size:    32 | lm loss: 6.467518E+00 | loss scale: 16384.0 | grad norm: 90637.625 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2985/  159576 | consumed samples:        48272 | elapsed time per iteration (ms): 14888.7 | learning rate: 1.337E-05 | global batch size:    32 | lm loss: 6.586103E+00 | loss scale: 16384.0 | grad norm: 81306.452 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2986/  159576 | consumed samples:        48304 | elapsed time per iteration (ms): 14588.0 | learning rate: 1.338E-05 | global batch size:    32 | lm loss: 6.541125E+00 | loss scale: 16384.0 | grad norm: 62368.768 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2987/  159576 | consumed samples:        48336 | elapsed time per iteration (ms): 14597.9 | learning rate: 1.339E-05 | global batch size:    32 | lm loss: 6.591407E+00 | loss scale: 16384.0 | grad norm: 87504.003 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2988/  159576 | consumed samples:        48368 | elapsed time per iteration (ms): 14590.3 | learning rate: 1.340E-05 | global batch size:    32 | lm loss: 6.678365E+00 | loss scale: 16384.0 | grad norm: 78293.170 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2989/  159576 | consumed samples:        48400 | elapsed time per iteration (ms): 15031.9 | learning rate: 1.341E-05 | global batch size:    32 | lm loss: 6.564939E+00 | loss scale: 16384.0 | grad norm: 77173.924 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2990/  159576 | consumed samples:        48432 | elapsed time per iteration (ms): 14705.4 | learning rate: 1.342E-05 | global batch size:    32 | lm loss: 6.692814E+00 | loss scale: 16384.0 | grad norm: 57544.626 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2991/  159576 | consumed samples:        48464 | elapsed time per iteration (ms): 14586.3 | learning rate: 1.342E-05 | global batch size:    32 | lm loss: 6.628499E+00 | loss scale: 16384.0 | grad norm: 75164.585 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2992/  159576 | consumed samples:        48496 | elapsed time per iteration (ms): 14624.5 | learning rate: 1.343E-05 | global batch size:    32 | lm loss: 6.582328E+00 | loss scale: 16384.0 | grad norm: 79666.420 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2993/  159576 | consumed samples:        48528 | elapsed time per iteration (ms): 14950.3 | learning rate: 1.344E-05 | global batch size:    32 | lm loss: 6.558386E+00 | loss scale: 16384.0 | grad norm: 55234.958 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2994/  159576 | consumed samples:        48560 | elapsed time per iteration (ms): 14695.8 | learning rate: 1.345E-05 | global batch size:    32 | lm loss: 6.676173E+00 | loss scale: 16384.0 | grad norm: 99524.065 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2995/  159576 | consumed samples:        48592 | elapsed time per iteration (ms): 14559.9 | learning rate: 1.346E-05 | global batch size:    32 | lm loss: 6.529976E+00 | loss scale: 16384.0 | grad norm: 70438.746 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2996/  159576 | consumed samples:        48624 | elapsed time per iteration (ms): 14633.0 | learning rate: 1.347E-05 | global batch size:    32 | lm loss: 6.703721E+00 | loss scale: 16384.0 | grad norm: 67881.530 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2997/  159576 | consumed samples:        48656 | elapsed time per iteration (ms): 14953.9 | learning rate: 1.348E-05 | global batch size:    32 | lm loss: 6.681076E+00 | loss scale: 16384.0 | grad norm: 65351.573 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2998/  159576 | consumed samples:        48688 | elapsed time per iteration (ms): 14564.8 | learning rate: 1.349E-05 | global batch size:    32 | lm loss: 6.741710E+00 | loss scale: 16384.0 | grad norm: 80607.424 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2999/  159576 | consumed samples:        48720 | elapsed time per iteration (ms): 14589.6 | learning rate: 1.350E-05 | global batch size:    32 | lm loss: 6.628071E+00 | loss scale: 16384.0 | grad norm: 80325.390 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3000/  159576 | consumed samples:        48752 | elapsed time per iteration (ms): 14580.8 | learning rate: 1.350E-05 | global batch size:    32 | lm loss: 6.770764E+00 | loss scale: 16384.0 | grad norm: 62881.195 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
------------------------------------------------------------------------------------------------
 validation loss at iteration 3000 | lm loss value: 6.650529E+00 | lm loss PPL: 7.731932E+02 | 
------------------------------------------------------------------------------------------------
saving checkpoint at iteration    3000 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints
[2021-09-24 13:45:23,476] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/global_step3000/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration    3000 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints
time (ms) | save-checkpoint: 19970.36
 iteration     3001/  159576 | consumed samples:        48784 | elapsed time per iteration (ms): 40923.5 | learning rate: 1.351E-05 | global batch size:    32 | lm loss: 6.512466E+00 | loss scale: 16384.0 | grad norm: 78526.093 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3002/  159576 | consumed samples:        48816 | elapsed time per iteration (ms): 14454.5 | learning rate: 1.352E-05 | global batch size:    32 | lm loss: 6.725769E+00 | loss scale: 16384.0 | grad norm: 52532.916 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3003/  159576 | consumed samples:        48848 | elapsed time per iteration (ms): 14508.9 | learning rate: 1.353E-05 | global batch size:    32 | lm loss: 6.778904E+00 | loss scale: 16384.0 | grad norm: 61815.208 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3004/  159576 | consumed samples:        48880 | elapsed time per iteration (ms): 14774.8 | learning rate: 1.354E-05 | global batch size:    32 | lm loss: 6.600959E+00 | loss scale: 16384.0 | grad norm: 72563.840 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3005/  159576 | consumed samples:        48912 | elapsed time per iteration (ms): 14543.7 | learning rate: 1.355E-05 | global batch size:    32 | lm loss: 6.630536E+00 | loss scale: 16384.0 | grad norm: 52120.360 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3006/  159576 | consumed samples:        48944 | elapsed time per iteration (ms): 14501.8 | learning rate: 1.356E-05 | global batch size:    32 | lm loss: 6.661976E+00 | loss scale: 16384.0 | grad norm: 60799.900 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3007/  159576 | consumed samples:        48976 | elapsed time per iteration (ms): 14465.0 | learning rate: 1.357E-05 | global batch size:    32 | lm loss: 6.695879E+00 | loss scale: 16384.0 | grad norm: 55470.787 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3008/  159576 | consumed samples:        49008 | elapsed time per iteration (ms): 14696.5 | learning rate: 1.358E-05 | global batch size:    32 | lm loss: 6.613426E+00 | loss scale: 16384.0 | grad norm: 80502.152 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3009/  159576 | consumed samples:        49040 | elapsed time per iteration (ms): 14441.9 | learning rate: 1.358E-05 | global batch size:    32 | lm loss: 6.640174E+00 | loss scale: 16384.0 | grad norm: 53100.335 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3010/  159576 | consumed samples:        49072 | elapsed time per iteration (ms): 14484.3 | learning rate: 1.359E-05 | global batch size:    32 | lm loss: 6.660203E+00 | loss scale: 16384.0 | grad norm: 69573.492 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3011/  159576 | consumed samples:        49104 | elapsed time per iteration (ms): 14599.1 | learning rate: 1.360E-05 | global batch size:    32 | lm loss: 6.674448E+00 | loss scale: 16384.0 | grad norm: 49737.183 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3012/  159576 | consumed samples:        49136 | elapsed time per iteration (ms): 14701.4 | learning rate: 1.361E-05 | global batch size:    32 | lm loss: 6.607582E+00 | loss scale: 16384.0 | grad norm: 121923.648 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3013/  159576 | consumed samples:        49168 | elapsed time per iteration (ms): 14527.2 | learning rate: 1.362E-05 | global batch size:    32 | lm loss: 6.552118E+00 | loss scale: 16384.0 | grad norm: 86117.234 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3014/  159576 | consumed samples:        49200 | elapsed time per iteration (ms): 14528.7 | learning rate: 1.363E-05 | global batch size:    32 | lm loss: 6.628557E+00 | loss scale: 16384.0 | grad norm: 65341.685 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3015/  159576 | consumed samples:        49232 | elapsed time per iteration (ms): 14528.2 | learning rate: 1.364E-05 | global batch size:    32 | lm loss: 6.637073E+00 | loss scale: 16384.0 | grad norm: 56388.918 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3016/  159576 | consumed samples:        49264 | elapsed time per iteration (ms): 14818.6 | learning rate: 1.365E-05 | global batch size:    32 | lm loss: 6.643037E+00 | loss scale: 16384.0 | grad norm: 92476.269 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3017/  159576 | consumed samples:        49296 | elapsed time per iteration (ms): 14532.4 | learning rate: 1.366E-05 | global batch size:    32 | lm loss: 6.517512E+00 | loss scale: 16384.0 | grad norm: 69528.273 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3018/  159576 | consumed samples:        49328 | elapsed time per iteration (ms): 14482.9 | learning rate: 1.366E-05 | global batch size:    32 | lm loss: 6.593336E+00 | loss scale: 16384.0 | grad norm: 58227.816 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3019/  159576 | consumed samples:        49360 | elapsed time per iteration (ms): 14483.3 | learning rate: 1.367E-05 | global batch size:    32 | lm loss: 6.682046E+00 | loss scale: 16384.0 | grad norm: 77807.619 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3020/  159576 | consumed samples:        49392 | elapsed time per iteration (ms): 15039.4 | learning rate: 1.368E-05 | global batch size:    32 | lm loss: 6.511760E+00 | loss scale: 16384.0 | grad norm: 61711.980 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3021/  159576 | consumed samples:        49424 | elapsed time per iteration (ms): 14532.3 | learning rate: 1.369E-05 | global batch size:    32 | lm loss: 6.601027E+00 | loss scale: 16384.0 | grad norm: 59045.542 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3022/  159576 | consumed samples:        49456 | elapsed time per iteration (ms): 14411.9 | learning rate: 1.370E-05 | global batch size:    32 | lm loss: 6.669757E+00 | loss scale: 16384.0 | grad norm: 79072.886 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3023/  159576 | consumed samples:        49488 | elapsed time per iteration (ms): 14433.5 | learning rate: 1.371E-05 | global batch size:    32 | lm loss: 6.660283E+00 | loss scale: 16384.0 | grad norm: 83581.808 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3024/  159576 | consumed samples:        49520 | elapsed time per iteration (ms): 14915.2 | learning rate: 1.372E-05 | global batch size:    32 | lm loss: 6.621551E+00 | loss scale: 16384.0 | grad norm: 64854.144 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3025/  159576 | consumed samples:        49552 | elapsed time per iteration (ms): 14425.9 | learning rate: 1.373E-05 | global batch size:    32 | lm loss: 6.591113E+00 | loss scale: 16384.0 | grad norm: 52620.079 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3026/  159576 | consumed samples:        49584 | elapsed time per iteration (ms): 14542.0 | learning rate: 1.374E-05 | global batch size:    32 | lm loss: 6.659728E+00 | loss scale: 16384.0 | grad norm: 50471.019 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3027/  159576 | consumed samples:        49616 | elapsed time per iteration (ms): 14493.7 | learning rate: 1.374E-05 | global batch size:    32 | lm loss: 6.786015E+00 | loss scale: 16384.0 | grad norm: 89599.838 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3028/  159576 | consumed samples:        49648 | elapsed time per iteration (ms): 14955.9 | learning rate: 1.375E-05 | global batch size:    32 | lm loss: 6.515626E+00 | loss scale: 16384.0 | grad norm: 71757.893 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3029/  159576 | consumed samples:        49680 | elapsed time per iteration (ms): 14451.8 | learning rate: 1.376E-05 | global batch size:    32 | lm loss: 6.552487E+00 | loss scale: 16384.0 | grad norm: 59493.240 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3030/  159576 | consumed samples:        49712 | elapsed time per iteration (ms): 14565.2 | learning rate: 1.377E-05 | global batch size:    32 | lm loss: 6.515723E+00 | loss scale: 16384.0 | grad norm: 70621.618 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3031/  159576 | consumed samples:        49744 | elapsed time per iteration (ms): 14573.9 | learning rate: 1.378E-05 | global batch size:    32 | lm loss: 6.533678E+00 | loss scale: 16384.0 | grad norm: 67416.578 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3032/  159576 | consumed samples:        49776 | elapsed time per iteration (ms): 14838.7 | learning rate: 1.379E-05 | global batch size:    32 | lm loss: 6.558086E+00 | loss scale: 16384.0 | grad norm: 57733.715 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3033/  159576 | consumed samples:        49808 | elapsed time per iteration (ms): 14602.8 | learning rate: 1.380E-05 | global batch size:    32 | lm loss: 6.520467E+00 | loss scale: 16384.0 | grad norm: 82103.090 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3034/  159576 | consumed samples:        49840 | elapsed time per iteration (ms): 14562.2 | learning rate: 1.381E-05 | global batch size:    32 | lm loss: 6.583010E+00 | loss scale: 16384.0 | grad norm: 49461.985 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3035/  159576 | consumed samples:        49872 | elapsed time per iteration (ms): 14551.2 | learning rate: 1.382E-05 | global batch size:    32 | lm loss: 6.614191E+00 | loss scale: 16384.0 | grad norm: 42934.432 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3036/  159576 | consumed samples:        49904 | elapsed time per iteration (ms): 15033.1 | learning rate: 1.382E-05 | global batch size:    32 | lm loss: 6.646058E+00 | loss scale: 16384.0 | grad norm: 72475.817 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3037/  159576 | consumed samples:        49936 | elapsed time per iteration (ms): 14506.7 | learning rate: 1.383E-05 | global batch size:    32 | lm loss: 6.657450E+00 | loss scale: 16384.0 | grad norm: 51862.308 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3038/  159576 | consumed samples:        49968 | elapsed time per iteration (ms): 14535.4 | learning rate: 1.384E-05 | global batch size:    32 | lm loss: 6.474831E+00 | loss scale: 16384.0 | grad norm: 54826.780 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3039/  159576 | consumed samples:        50000 | elapsed time per iteration (ms): 14517.2 | learning rate: 1.385E-05 | global batch size:    32 | lm loss: 6.491888E+00 | loss scale: 16384.0 | grad norm: 48045.161 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3040/  159576 | consumed samples:        50032 | elapsed time per iteration (ms): 14679.0 | learning rate: 1.386E-05 | global batch size:    32 | lm loss: 6.557182E+00 | loss scale: 16384.0 | grad norm: 79148.314 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3041/  159576 | consumed samples:        50064 | elapsed time per iteration (ms): 14829.2 | learning rate: 1.387E-05 | global batch size:    32 | lm loss: 6.624621E+00 | loss scale: 16384.0 | grad norm: 50930.400 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3042/  159576 | consumed samples:        50096 | elapsed time per iteration (ms): 14560.9 | learning rate: 1.388E-05 | global batch size:    32 | lm loss: 6.572658E+00 | loss scale: 16384.0 | grad norm: 72539.370 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3043/  159576 | consumed samples:        50128 | elapsed time per iteration (ms): 14616.0 | learning rate: 1.389E-05 | global batch size:    32 | lm loss: 6.654581E+00 | loss scale: 16384.0 | grad norm: 66089.217 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3044/  159576 | consumed samples:        50160 | elapsed time per iteration (ms): 14597.6 | learning rate: 1.389E-05 | global batch size:    32 | lm loss: 6.568760E+00 | loss scale: 16384.0 | grad norm: 77389.521 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3045/  159576 | consumed samples:        50192 | elapsed time per iteration (ms): 14717.8 | learning rate: 1.390E-05 | global batch size:    32 | lm loss: 6.562954E+00 | loss scale: 16384.0 | grad norm: 59175.329 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3046/  159576 | consumed samples:        50224 | elapsed time per iteration (ms): 14549.8 | learning rate: 1.391E-05 | global batch size:    32 | lm loss: 6.519083E+00 | loss scale: 16384.0 | grad norm: 72573.917 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3047/  159576 | consumed samples:        50256 | elapsed time per iteration (ms): 14547.8 | learning rate: 1.392E-05 | global batch size:    32 | lm loss: 6.586189E+00 | loss scale: 16384.0 | grad norm: 63454.734 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3048/  159576 | consumed samples:        50288 | elapsed time per iteration (ms): 14699.8 | learning rate: 1.393E-05 | global batch size:    32 | lm loss: 6.629214E+00 | loss scale: 16384.0 | grad norm: 49137.706 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3049/  159576 | consumed samples:        50320 | elapsed time per iteration (ms): 14760.5 | learning rate: 1.394E-05 | global batch size:    32 | lm loss: 6.567476E+00 | loss scale: 16384.0 | grad norm: 59423.684 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3050/  159576 | consumed samples:        50352 | elapsed time per iteration (ms): 14605.2 | learning rate: 1.395E-05 | global batch size:    32 | lm loss: 6.560441E+00 | loss scale: 16384.0 | grad norm: 76106.960 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3051/  159576 | consumed samples:        50384 | elapsed time per iteration (ms): 14589.0 | learning rate: 1.396E-05 | global batch size:    32 | lm loss: 6.676329E+00 | loss scale: 16384.0 | grad norm: 43490.816 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3052/  159576 | consumed samples:        50416 | elapsed time per iteration (ms): 14546.5 | learning rate: 1.397E-05 | global batch size:    32 | lm loss: 6.531154E+00 | loss scale: 16384.0 | grad norm: 77324.537 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3053/  159576 | consumed samples:        50448 | elapsed time per iteration (ms): 14689.5 | learning rate: 1.397E-05 | global batch size:    32 | lm loss: 6.457368E+00 | loss scale: 16384.0 | grad norm: 61005.030 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3054/  159576 | consumed samples:        50480 | elapsed time per iteration (ms): 14604.5 | learning rate: 1.398E-05 | global batch size:    32 | lm loss: 6.694659E+00 | loss scale: 16384.0 | grad norm: 50570.324 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3055/  159576 | consumed samples:        50512 | elapsed time per iteration (ms): 14507.3 | learning rate: 1.399E-05 | global batch size:    32 | lm loss: 6.639795E+00 | loss scale: 16384.0 | grad norm: 57017.274 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3056/  159576 | consumed samples:        50544 | elapsed time per iteration (ms): 14581.4 | learning rate: 1.400E-05 | global batch size:    32 | lm loss: 6.619573E+00 | loss scale: 16384.0 | grad norm: 60323.172 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3057/  159576 | consumed samples:        50576 | elapsed time per iteration (ms): 15078.3 | learning rate: 1.401E-05 | global batch size:    32 | lm loss: 6.636419E+00 | loss scale: 16384.0 | grad norm: 49598.220 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3058/  159576 | consumed samples:        50608 | elapsed time per iteration (ms): 14576.1 | learning rate: 1.402E-05 | global batch size:    32 | lm loss: 6.591126E+00 | loss scale: 16384.0 | grad norm: 102052.253 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3059/  159576 | consumed samples:        50640 | elapsed time per iteration (ms): 14515.1 | learning rate: 1.403E-05 | global batch size:    32 | lm loss: 6.500241E+00 | loss scale: 16384.0 | grad norm: 52981.506 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3060/  159576 | consumed samples:        50672 | elapsed time per iteration (ms): 14582.7 | learning rate: 1.404E-05 | global batch size:    32 | lm loss: 6.553960E+00 | loss scale: 16384.0 | grad norm: 57341.020 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3061/  159576 | consumed samples:        50704 | elapsed time per iteration (ms): 14939.5 | learning rate: 1.405E-05 | global batch size:    32 | lm loss: 6.593186E+00 | loss scale: 16384.0 | grad norm: 50198.577 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3062/  159576 | consumed samples:        50736 | elapsed time per iteration (ms): 14545.7 | learning rate: 1.405E-05 | global batch size:    32 | lm loss: 6.577888E+00 | loss scale: 16384.0 | grad norm: 90008.008 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3063/  159576 | consumed samples:        50768 | elapsed time per iteration (ms): 14515.8 | learning rate: 1.406E-05 | global batch size:    32 | lm loss: 6.775355E+00 | loss scale: 16384.0 | grad norm: 52343.221 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3064/  159576 | consumed samples:        50800 | elapsed time per iteration (ms): 14570.2 | learning rate: 1.407E-05 | global batch size:    32 | lm loss: 6.724249E+00 | loss scale: 16384.0 | grad norm: 69939.860 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3065/  159576 | consumed samples:        50832 | elapsed time per iteration (ms): 14913.0 | learning rate: 1.408E-05 | global batch size:    32 | lm loss: 6.634195E+00 | loss scale: 16384.0 | grad norm: 70070.520 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3066/  159576 | consumed samples:        50864 | elapsed time per iteration (ms): 14497.8 | learning rate: 1.409E-05 | global batch size:    32 | lm loss: 6.591150E+00 | loss scale: 16384.0 | grad norm: 80109.931 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3067/  159576 | consumed samples:        50896 | elapsed time per iteration (ms): 14593.4 | learning rate: 1.410E-05 | global batch size:    32 | lm loss: 6.637640E+00 | loss scale: 16384.0 | grad norm: 51104.322 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3068/  159576 | consumed samples:        50928 | elapsed time per iteration (ms): 14459.7 | learning rate: 1.411E-05 | global batch size:    32 | lm loss: 6.595787E+00 | loss scale: 16384.0 | grad norm: 49458.599 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3069/  159576 | consumed samples:        50960 | elapsed time per iteration (ms): 14904.6 | learning rate: 1.412E-05 | global batch size:    32 | lm loss: 6.762650E+00 | loss scale: 16384.0 | grad norm: 88087.529 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3070/  159576 | consumed samples:        50992 | elapsed time per iteration (ms): 14578.7 | learning rate: 1.413E-05 | global batch size:    32 | lm loss: 6.615232E+00 | loss scale: 16384.0 | grad norm: 50851.426 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3071/  159576 | consumed samples:        51024 | elapsed time per iteration (ms): 14534.9 | learning rate: 1.413E-05 | global batch size:    32 | lm loss: 6.502337E+00 | loss scale: 16384.0 | grad norm: 82199.193 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3072/  159576 | consumed samples:        51056 | elapsed time per iteration (ms): 14555.3 | learning rate: 1.414E-05 | global batch size:    32 | lm loss: 6.552182E+00 | loss scale: 16384.0 | grad norm: 67542.628 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3073/  159576 | consumed samples:        51088 | elapsed time per iteration (ms): 15069.2 | learning rate: 1.415E-05 | global batch size:    32 | lm loss: 6.449011E+00 | loss scale: 16384.0 | grad norm: 113973.285 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3074/  159576 | consumed samples:        51120 | elapsed time per iteration (ms): 14473.5 | learning rate: 1.416E-05 | global batch size:    32 | lm loss: 6.462796E+00 | loss scale: 16384.0 | grad norm: 99530.753 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3075/  159576 | consumed samples:        51152 | elapsed time per iteration (ms): 14578.5 | learning rate: 1.417E-05 | global batch size:    32 | lm loss: 6.605415E+00 | loss scale: 16384.0 | grad norm: 79580.590 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3076/  159576 | consumed samples:        51184 | elapsed time per iteration (ms): 14526.0 | learning rate: 1.418E-05 | global batch size:    32 | lm loss: 6.643724E+00 | loss scale: 16384.0 | grad norm: 83910.537 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3077/  159576 | consumed samples:        51216 | elapsed time per iteration (ms): 14932.5 | learning rate: 1.419E-05 | global batch size:    32 | lm loss: 6.554170E+00 | loss scale: 16384.0 | grad norm: 41888.605 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3078/  159576 | consumed samples:        51248 | elapsed time per iteration (ms): 14631.5 | learning rate: 1.420E-05 | global batch size:    32 | lm loss: 6.609428E+00 | loss scale: 16384.0 | grad norm: 100795.398 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3079/  159576 | consumed samples:        51280 | elapsed time per iteration (ms): 14613.6 | learning rate: 1.421E-05 | global batch size:    32 | lm loss: 6.647438E+00 | loss scale: 16384.0 | grad norm: 79478.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3080/  159576 | consumed samples:        51312 | elapsed time per iteration (ms): 14624.3 | learning rate: 1.421E-05 | global batch size:    32 | lm loss: 6.548526E+00 | loss scale: 16384.0 | grad norm: 61687.326 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3081/  159576 | consumed samples:        51344 | elapsed time per iteration (ms): 14941.2 | learning rate: 1.422E-05 | global batch size:    32 | lm loss: 6.559642E+00 | loss scale: 16384.0 | grad norm: 51017.983 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3082/  159576 | consumed samples:        51376 | elapsed time per iteration (ms): 14650.5 | learning rate: 1.423E-05 | global batch size:    32 | lm loss: 6.513590E+00 | loss scale: 16384.0 | grad norm: 62838.209 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3083/  159576 | consumed samples:        51408 | elapsed time per iteration (ms): 14595.1 | learning rate: 1.424E-05 | global batch size:    32 | lm loss: 6.454400E+00 | loss scale: 16384.0 | grad norm: 85218.080 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3084/  159576 | consumed samples:        51440 | elapsed time per iteration (ms): 14539.5 | learning rate: 1.425E-05 | global batch size:    32 | lm loss: 6.667971E+00 | loss scale: 16384.0 | grad norm: 74883.565 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3085/  159576 | consumed samples:        51472 | elapsed time per iteration (ms): 14496.8 | learning rate: 1.426E-05 | global batch size:    32 | lm loss: 6.608503E+00 | loss scale: 16384.0 | grad norm: 64204.771 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3086/  159576 | consumed samples:        51504 | elapsed time per iteration (ms): 14686.0 | learning rate: 1.427E-05 | global batch size:    32 | lm loss: 6.699879E+00 | loss scale: 16384.0 | grad norm: 42613.436 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-24 14:06:36] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1165978_[1-10%1] on 'gpu_p13' partition)
[2021-09-24 14:06:36] PULSE: tr8-104B is running for 8:14:25 since 2021-09-24T05:52:11 (1162855_1 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8])
 iteration     3087/  159576 | consumed samples:        51536 | elapsed time per iteration (ms): 14518.6 | learning rate: 1.428E-05 | global batch size:    32 | lm loss: 6.539448E+00 | loss scale: 16384.0 | grad norm: 88063.533 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3088/  159576 | consumed samples:        51568 | elapsed time per iteration (ms): 14588.4 | learning rate: 1.429E-05 | global batch size:    32 | lm loss: 6.589184E+00 | loss scale: 16384.0 | grad norm: 54256.309 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3089/  159576 | consumed samples:        51600 | elapsed time per iteration (ms): 14631.0 | learning rate: 1.429E-05 | global batch size:    32 | lm loss: 6.700484E+00 | loss scale: 16384.0 | grad norm: 54269.384 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3090/  159576 | consumed samples:        51632 | elapsed time per iteration (ms): 14830.4 | learning rate: 1.430E-05 | global batch size:    32 | lm loss: 6.576167E+00 | loss scale: 16384.0 | grad norm: 57490.220 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3091/  159576 | consumed samples:        51664 | elapsed time per iteration (ms): 14445.4 | learning rate: 1.431E-05 | global batch size:    32 | lm loss: 6.601985E+00 | loss scale: 16384.0 | grad norm: 57872.821 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3092/  159576 | consumed samples:        51696 | elapsed time per iteration (ms): 14536.8 | learning rate: 1.432E-05 | global batch size:    32 | lm loss: 6.407238E+00 | loss scale: 16384.0 | grad norm: 52047.068 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3093/  159576 | consumed samples:        51728 | elapsed time per iteration (ms): 14606.0 | learning rate: 1.433E-05 | global batch size:    32 | lm loss: 6.659007E+00 | loss scale: 16384.0 | grad norm: 76903.539 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3094/  159576 | consumed samples:        51760 | elapsed time per iteration (ms): 14751.8 | learning rate: 1.434E-05 | global batch size:    32 | lm loss: 6.623207E+00 | loss scale: 16384.0 | grad norm: 98639.349 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3095/  159576 | consumed samples:        51792 | elapsed time per iteration (ms): 14636.3 | learning rate: 1.435E-05 | global batch size:    32 | lm loss: 6.697064E+00 | loss scale: 16384.0 | grad norm: 59113.249 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3096/  159576 | consumed samples:        51824 | elapsed time per iteration (ms): 14701.7 | learning rate: 1.436E-05 | global batch size:    32 | lm loss: 6.510694E+00 | loss scale: 16384.0 | grad norm: 57025.627 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3097/  159576 | consumed samples:        51856 | elapsed time per iteration (ms): 14643.0 | learning rate: 1.437E-05 | global batch size:    32 | lm loss: 6.610021E+00 | loss scale: 16384.0 | grad norm: 90059.366 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3098/  159576 | consumed samples:        51888 | elapsed time per iteration (ms): 14837.7 | learning rate: 1.437E-05 | global batch size:    32 | lm loss: 6.534551E+00 | loss scale: 16384.0 | grad norm: 45874.846 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3099/  159576 | consumed samples:        51920 | elapsed time per iteration (ms): 14607.4 | learning rate: 1.438E-05 | global batch size:    32 | lm loss: 6.517954E+00 | loss scale: 16384.0 | grad norm: 60226.775 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3100/  159576 | consumed samples:        51952 | elapsed time per iteration (ms): 14537.4 | learning rate: 1.439E-05 | global batch size:    32 | lm loss: 6.457252E+00 | loss scale: 16384.0 | grad norm: 46090.904 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3101/  159576 | consumed samples:        51984 | elapsed time per iteration (ms): 14526.9 | learning rate: 1.440E-05 | global batch size:    32 | lm loss: 6.609892E+00 | loss scale: 16384.0 | grad norm: 94724.964 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3102/  159576 | consumed samples:        52016 | elapsed time per iteration (ms): 14927.9 | learning rate: 1.441E-05 | global batch size:    32 | lm loss: 6.698421E+00 | loss scale: 16384.0 | grad norm: 87402.445 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3103/  159576 | consumed samples:        52048 | elapsed time per iteration (ms): 14723.0 | learning rate: 1.442E-05 | global batch size:    32 | lm loss: 6.607485E+00 | loss scale: 16384.0 | grad norm: 53552.486 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3104/  159576 | consumed samples:        52080 | elapsed time per iteration (ms): 14655.6 | learning rate: 1.443E-05 | global batch size:    32 | lm loss: 6.771776E+00 | loss scale: 16384.0 | grad norm: 77470.084 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3105/  159576 | consumed samples:        52112 | elapsed time per iteration (ms): 14632.7 | learning rate: 1.444E-05 | global batch size:    32 | lm loss: 6.573309E+00 | loss scale: 16384.0 | grad norm: 60932.025 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3106/  159576 | consumed samples:        52144 | elapsed time per iteration (ms): 15115.7 | learning rate: 1.445E-05 | global batch size:    32 | lm loss: 6.610741E+00 | loss scale: 16384.0 | grad norm: 67949.607 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3107/  159576 | consumed samples:        52176 | elapsed time per iteration (ms): 14559.3 | learning rate: 1.445E-05 | global batch size:    32 | lm loss: 6.538753E+00 | loss scale: 16384.0 | grad norm: 71734.909 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3108/  159576 | consumed samples:        52208 | elapsed time per iteration (ms): 14588.4 | learning rate: 1.446E-05 | global batch size:    32 | lm loss: 6.527990E+00 | loss scale: 16384.0 | grad norm: 86170.275 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3109/  159576 | consumed samples:        52240 | elapsed time per iteration (ms): 14660.3 | learning rate: 1.447E-05 | global batch size:    32 | lm loss: 6.556553E+00 | loss scale: 16384.0 | grad norm: 46751.259 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3110/  159576 | consumed samples:        52272 | elapsed time per iteration (ms): 15046.4 | learning rate: 1.448E-05 | global batch size:    32 | lm loss: 6.566851E+00 | loss scale: 16384.0 | grad norm: 67209.417 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3111/  159576 | consumed samples:        52304 | elapsed time per iteration (ms): 14570.9 | learning rate: 1.449E-05 | global batch size:    32 | lm loss: 6.635989E+00 | loss scale: 16384.0 | grad norm: 53538.451 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3112/  159576 | consumed samples:        52336 | elapsed time per iteration (ms): 14664.0 | learning rate: 1.450E-05 | global batch size:    32 | lm loss: 6.739109E+00 | loss scale: 16384.0 | grad norm: 100581.395 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3113/  159576 | consumed samples:        52368 | elapsed time per iteration (ms): 14690.0 | learning rate: 1.451E-05 | global batch size:    32 | lm loss: 6.534431E+00 | loss scale: 16384.0 | grad norm: 69366.573 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3114/  159576 | consumed samples:        52400 | elapsed time per iteration (ms): 14854.6 | learning rate: 1.452E-05 | global batch size:    32 | lm loss: 6.481595E+00 | loss scale: 16384.0 | grad norm: 57933.457 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3115/  159576 | consumed samples:        52432 | elapsed time per iteration (ms): 14581.0 | learning rate: 1.453E-05 | global batch size:    32 | lm loss: 6.466241E+00 | loss scale: 16384.0 | grad norm: 91764.400 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3116/  159576 | consumed samples:        52464 | elapsed time per iteration (ms): 14603.8 | learning rate: 1.453E-05 | global batch size:    32 | lm loss: 6.818060E+00 | loss scale: 16384.0 | grad norm: 73322.908 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3117/  159576 | consumed samples:        52496 | elapsed time per iteration (ms): 14655.4 | learning rate: 1.454E-05 | global batch size:    32 | lm loss: 6.541664E+00 | loss scale: 16384.0 | grad norm: 79876.153 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3118/  159576 | consumed samples:        52528 | elapsed time per iteration (ms): 15059.6 | learning rate: 1.455E-05 | global batch size:    32 | lm loss: 6.582567E+00 | loss scale: 16384.0 | grad norm: 57737.032 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3119/  159576 | consumed samples:        52560 | elapsed time per iteration (ms): 14561.2 | learning rate: 1.456E-05 | global batch size:    32 | lm loss: 6.616435E+00 | loss scale: 16384.0 | grad norm: 75078.207 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3120/  159576 | consumed samples:        52592 | elapsed time per iteration (ms): 14627.9 | learning rate: 1.457E-05 | global batch size:    32 | lm loss: 6.688129E+00 | loss scale: 16384.0 | grad norm: 51450.549 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3121/  159576 | consumed samples:        52624 | elapsed time per iteration (ms): 14579.2 | learning rate: 1.458E-05 | global batch size:    32 | lm loss: 6.456697E+00 | loss scale: 16384.0 | grad norm: 69973.541 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3122/  159576 | consumed samples:        52656 | elapsed time per iteration (ms): 15025.4 | learning rate: 1.459E-05 | global batch size:    32 | lm loss: 6.629485E+00 | loss scale: 16384.0 | grad norm: 57268.432 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3123/  159576 | consumed samples:        52688 | elapsed time per iteration (ms): 14578.8 | learning rate: 1.460E-05 | global batch size:    32 | lm loss: 6.404414E+00 | loss scale: 16384.0 | grad norm: 63882.617 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3124/  159576 | consumed samples:        52720 | elapsed time per iteration (ms): 14582.6 | learning rate: 1.461E-05 | global batch size:    32 | lm loss: 6.473093E+00 | loss scale: 16384.0 | grad norm: 50308.716 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3125/  159576 | consumed samples:        52752 | elapsed time per iteration (ms): 14640.7 | learning rate: 1.461E-05 | global batch size:    32 | lm loss: 6.497868E+00 | loss scale: 16384.0 | grad norm: 63650.300 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3126/  159576 | consumed samples:        52784 | elapsed time per iteration (ms): 15046.6 | learning rate: 1.462E-05 | global batch size:    32 | lm loss: 6.549313E+00 | loss scale: 16384.0 | grad norm: 72289.376 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3127/  159576 | consumed samples:        52816 | elapsed time per iteration (ms): 14723.2 | learning rate: 1.463E-05 | global batch size:    32 | lm loss: 6.590129E+00 | loss scale: 16384.0 | grad norm: 47547.789 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3128/  159576 | consumed samples:        52848 | elapsed time per iteration (ms): 14552.7 | learning rate: 1.464E-05 | global batch size:    32 | lm loss: 6.731832E+00 | loss scale: 16384.0 | grad norm: 68103.769 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3129/  159576 | consumed samples:        52880 | elapsed time per iteration (ms): 14573.2 | learning rate: 1.465E-05 | global batch size:    32 | lm loss: 6.528438E+00 | loss scale: 16384.0 | grad norm: 57671.557 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3130/  159576 | consumed samples:        52912 | elapsed time per iteration (ms): 14663.9 | learning rate: 1.466E-05 | global batch size:    32 | lm loss: 6.672345E+00 | loss scale: 16384.0 | grad norm: 42986.119 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3131/  159576 | consumed samples:        52944 | elapsed time per iteration (ms): 14852.7 | learning rate: 1.467E-05 | global batch size:    32 | lm loss: 6.489813E+00 | loss scale: 16384.0 | grad norm: 54642.587 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3132/  159576 | consumed samples:        52976 | elapsed time per iteration (ms): 14644.1 | learning rate: 1.468E-05 | global batch size:    32 | lm loss: 6.597792E+00 | loss scale: 16384.0 | grad norm: 52604.609 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3133/  159576 | consumed samples:        53008 | elapsed time per iteration (ms): 14641.3 | learning rate: 1.468E-05 | global batch size:    32 | lm loss: 6.527011E+00 | loss scale: 16384.0 | grad norm: 59630.271 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3134/  159576 | consumed samples:        53040 | elapsed time per iteration (ms): 14626.4 | learning rate: 1.469E-05 | global batch size:    32 | lm loss: 6.581876E+00 | loss scale: 16384.0 | grad norm: 57219.019 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3135/  159576 | consumed samples:        53072 | elapsed time per iteration (ms): 14774.4 | learning rate: 1.470E-05 | global batch size:    32 | lm loss: 6.708944E+00 | loss scale: 16384.0 | grad norm: 55756.795 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3136/  159576 | consumed samples:        53104 | elapsed time per iteration (ms): 14618.5 | learning rate: 1.471E-05 | global batch size:    32 | lm loss: 6.679635E+00 | loss scale: 16384.0 | grad norm: 42400.449 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3137/  159576 | consumed samples:        53136 | elapsed time per iteration (ms): 14614.4 | learning rate: 1.472E-05 | global batch size:    32 | lm loss: 6.469272E+00 | loss scale: 16384.0 | grad norm: 142351.991 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3138/  159576 | consumed samples:        53168 | elapsed time per iteration (ms): 14596.5 | learning rate: 1.473E-05 | global batch size:    32 | lm loss: 6.554899E+00 | loss scale: 16384.0 | grad norm: 98568.745 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3139/  159576 | consumed samples:        53200 | elapsed time per iteration (ms): 14719.6 | learning rate: 1.474E-05 | global batch size:    32 | lm loss: 6.618309E+00 | loss scale: 16384.0 | grad norm: 73504.428 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3140/  159576 | consumed samples:        53232 | elapsed time per iteration (ms): 14627.2 | learning rate: 1.475E-05 | global batch size:    32 | lm loss: 6.588873E+00 | loss scale: 16384.0 | grad norm: 73534.305 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3141/  159576 | consumed samples:        53264 | elapsed time per iteration (ms): 14634.4 | learning rate: 1.476E-05 | global batch size:    32 | lm loss: 6.357007E+00 | loss scale: 16384.0 | grad norm: 84712.428 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3142/  159576 | consumed samples:        53296 | elapsed time per iteration (ms): 14717.8 | learning rate: 1.476E-05 | global batch size:    32 | lm loss: 6.623076E+00 | loss scale: 16384.0 | grad norm: 94140.049 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3143/  159576 | consumed samples:        53328 | elapsed time per iteration (ms): 14697.5 | learning rate: 1.477E-05 | global batch size:    32 | lm loss: 6.562120E+00 | loss scale: 16384.0 | grad norm: 60657.367 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3144/  159576 | consumed samples:        53360 | elapsed time per iteration (ms): 14578.1 | learning rate: 1.478E-05 | global batch size:    32 | lm loss: 6.445246E+00 | loss scale: 16384.0 | grad norm: 61798.286 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3145/  159576 | consumed samples:        53392 | elapsed time per iteration (ms): 14616.8 | learning rate: 1.479E-05 | global batch size:    32 | lm loss: 6.440137E+00 | loss scale: 16384.0 | grad norm: 72537.427 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3146/  159576 | consumed samples:        53424 | elapsed time per iteration (ms): 14619.6 | learning rate: 1.480E-05 | global batch size:    32 | lm loss: 6.739626E+00 | loss scale: 16384.0 | grad norm: 53372.380 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3147/  159576 | consumed samples:        53456 | elapsed time per iteration (ms): 14895.9 | learning rate: 1.481E-05 | global batch size:    32 | lm loss: 6.588343E+00 | loss scale: 16384.0 | grad norm: 132102.636 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3148/  159576 | consumed samples:        53488 | elapsed time per iteration (ms): 14681.1 | learning rate: 1.482E-05 | global batch size:    32 | lm loss: 6.551591E+00 | loss scale: 16384.0 | grad norm: 58550.161 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3149/  159576 | consumed samples:        53520 | elapsed time per iteration (ms): 14682.3 | learning rate: 1.483E-05 | global batch size:    32 | lm loss: 6.632958E+00 | loss scale: 16384.0 | grad norm: 77007.903 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3150/  159576 | consumed samples:        53552 | elapsed time per iteration (ms): 14624.1 | learning rate: 1.484E-05 | global batch size:    32 | lm loss: 6.648820E+00 | loss scale: 16384.0 | grad norm: 86896.917 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3151/  159576 | consumed samples:        53584 | elapsed time per iteration (ms): 14845.8 | learning rate: 1.484E-05 | global batch size:    32 | lm loss: 6.446036E+00 | loss scale: 16384.0 | grad norm: 89979.541 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3152/  159576 | consumed samples:        53616 | elapsed time per iteration (ms): 14727.8 | learning rate: 1.485E-05 | global batch size:    32 | lm loss: 6.617037E+00 | loss scale: 16384.0 | grad norm: 58488.767 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3153/  159576 | consumed samples:        53648 | elapsed time per iteration (ms): 14649.7 | learning rate: 1.486E-05 | global batch size:    32 | lm loss: 6.529748E+00 | loss scale: 16384.0 | grad norm: 74833.007 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3154/  159576 | consumed samples:        53680 | elapsed time per iteration (ms): 14647.6 | learning rate: 1.487E-05 | global batch size:    32 | lm loss: 6.562946E+00 | loss scale: 16384.0 | grad norm: 52935.305 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3155/  159576 | consumed samples:        53712 | elapsed time per iteration (ms): 15107.7 | learning rate: 1.488E-05 | global batch size:    32 | lm loss: 6.514643E+00 | loss scale: 16384.0 | grad norm: 115570.754 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3156/  159576 | consumed samples:        53744 | elapsed time per iteration (ms): 14720.1 | learning rate: 1.489E-05 | global batch size:    32 | lm loss: 6.684644E+00 | loss scale: 16384.0 | grad norm: 80957.169 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3157/  159576 | consumed samples:        53776 | elapsed time per iteration (ms): 14692.8 | learning rate: 1.490E-05 | global batch size:    32 | lm loss: 6.519046E+00 | loss scale: 16384.0 | grad norm: 55678.918 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3158/  159576 | consumed samples:        53808 | elapsed time per iteration (ms): 14686.5 | learning rate: 1.491E-05 | global batch size:    32 | lm loss: 6.746099E+00 | loss scale: 16384.0 | grad norm: 90492.004 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3159/  159576 | consumed samples:        53840 | elapsed time per iteration (ms): 15011.1 | learning rate: 1.492E-05 | global batch size:    32 | lm loss: 6.536778E+00 | loss scale: 16384.0 | grad norm: 71520.569 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3160/  159576 | consumed samples:        53872 | elapsed time per iteration (ms): 14579.4 | learning rate: 1.492E-05 | global batch size:    32 | lm loss: 6.666056E+00 | loss scale: 16384.0 | grad norm: 84616.230 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3161/  159576 | consumed samples:        53904 | elapsed time per iteration (ms): 14644.1 | learning rate: 1.493E-05 | global batch size:    32 | lm loss: 6.597644E+00 | loss scale: 16384.0 | grad norm: 75093.664 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3162/  159576 | consumed samples:        53936 | elapsed time per iteration (ms): 14697.1 | learning rate: 1.494E-05 | global batch size:    32 | lm loss: 6.446161E+00 | loss scale: 16384.0 | grad norm: 65649.952 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3163/  159576 | consumed samples:        53968 | elapsed time per iteration (ms): 14947.2 | learning rate: 1.495E-05 | global batch size:    32 | lm loss: 6.681765E+00 | loss scale: 16384.0 | grad norm: 60219.876 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3164/  159576 | consumed samples:        54000 | elapsed time per iteration (ms): 14663.4 | learning rate: 1.496E-05 | global batch size:    32 | lm loss: 6.525707E+00 | loss scale: 16384.0 | grad norm: 68154.761 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3165/  159576 | consumed samples:        54032 | elapsed time per iteration (ms): 14769.3 | learning rate: 1.497E-05 | global batch size:    32 | lm loss: 6.587021E+00 | loss scale: 16384.0 | grad norm: 78180.401 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3166/  159576 | consumed samples:        54064 | elapsed time per iteration (ms): 14610.2 | learning rate: 1.498E-05 | global batch size:    32 | lm loss: 6.519161E+00 | loss scale: 16384.0 | grad norm: 61912.261 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3167/  159576 | consumed samples:        54096 | elapsed time per iteration (ms): 14999.0 | learning rate: 1.499E-05 | global batch size:    32 | lm loss: 6.632318E+00 | loss scale: 16384.0 | grad norm: 108253.525 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3168/  159576 | consumed samples:        54128 | elapsed time per iteration (ms): 14650.1 | learning rate: 1.500E-05 | global batch size:    32 | lm loss: 6.465475E+00 | loss scale: 16384.0 | grad norm: 62950.421 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3169/  159576 | consumed samples:        54160 | elapsed time per iteration (ms): 14661.3 | learning rate: 1.500E-05 | global batch size:    32 | lm loss: 6.539711E+00 | loss scale: 16384.0 | grad norm: 92615.638 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3170/  159576 | consumed samples:        54192 | elapsed time per iteration (ms): 14674.1 | learning rate: 1.501E-05 | global batch size:    32 | lm loss: 6.579189E+00 | loss scale: 16384.0 | grad norm: 83785.863 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3171/  159576 | consumed samples:        54224 | elapsed time per iteration (ms): 15070.8 | learning rate: 1.502E-05 | global batch size:    32 | lm loss: 6.793476E+00 | loss scale: 16384.0 | grad norm: 62540.200 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3172/  159576 | consumed samples:        54256 | elapsed time per iteration (ms): 14666.7 | learning rate: 1.503E-05 | global batch size:    32 | lm loss: 6.584558E+00 | loss scale: 16384.0 | grad norm: 112108.286 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3173/  159576 | consumed samples:        54288 | elapsed time per iteration (ms): 14625.8 | learning rate: 1.504E-05 | global batch size:    32 | lm loss: 6.600308E+00 | loss scale: 16384.0 | grad norm: 74654.549 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3174/  159576 | consumed samples:        54320 | elapsed time per iteration (ms): 14636.6 | learning rate: 1.505E-05 | global batch size:    32 | lm loss: 6.586472E+00 | loss scale: 16384.0 | grad norm: 64570.380 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3175/  159576 | consumed samples:        54352 | elapsed time per iteration (ms): 15097.6 | learning rate: 1.506E-05 | global batch size:    32 | lm loss: 6.611074E+00 | loss scale: 16384.0 | grad norm: 67988.200 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3176/  159576 | consumed samples:        54384 | elapsed time per iteration (ms): 14507.7 | learning rate: 1.507E-05 | global batch size:    32 | lm loss: 6.524911E+00 | loss scale: 16384.0 | grad norm: 52695.097 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3177/  159576 | consumed samples:        54416 | elapsed time per iteration (ms): 14667.9 | learning rate: 1.508E-05 | global batch size:    32 | lm loss: 6.622879E+00 | loss scale: 16384.0 | grad norm: 96311.880 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3178/  159576 | consumed samples:        54448 | elapsed time per iteration (ms): 14717.9 | learning rate: 1.508E-05 | global batch size:    32 | lm loss: 6.557679E+00 | loss scale: 16384.0 | grad norm: 75112.233 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3179/  159576 | consumed samples:        54480 | elapsed time per iteration (ms): 15028.6 | learning rate: 1.509E-05 | global batch size:    32 | lm loss: 6.508760E+00 | loss scale: 16384.0 | grad norm: 67929.222 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3180/  159576 | consumed samples:        54512 | elapsed time per iteration (ms): 14774.6 | learning rate: 1.510E-05 | global batch size:    32 | lm loss: 6.573524E+00 | loss scale: 16384.0 | grad norm: 76526.339 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3181/  159576 | consumed samples:        54544 | elapsed time per iteration (ms): 14648.5 | learning rate: 1.511E-05 | global batch size:    32 | lm loss: 6.629518E+00 | loss scale: 16384.0 | grad norm: 51441.208 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3182/  159576 | consumed samples:        54576 | elapsed time per iteration (ms): 14620.2 | learning rate: 1.512E-05 | global batch size:    32 | lm loss: 6.528477E+00 | loss scale: 16384.0 | grad norm: 84031.596 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3183/  159576 | consumed samples:        54608 | elapsed time per iteration (ms): 14671.0 | learning rate: 1.513E-05 | global batch size:    32 | lm loss: 6.450350E+00 | loss scale: 16384.0 | grad norm: 47787.227 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3184/  159576 | consumed samples:        54640 | elapsed time per iteration (ms): 14835.3 | learning rate: 1.514E-05 | global batch size:    32 | lm loss: 6.547495E+00 | loss scale: 16384.0 | grad norm: 57635.062 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3185/  159576 | consumed samples:        54672 | elapsed time per iteration (ms): 14691.4 | learning rate: 1.515E-05 | global batch size:    32 | lm loss: 6.438165E+00 | loss scale: 16384.0 | grad norm: 59205.412 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3186/  159576 | consumed samples:        54704 | elapsed time per iteration (ms): 14599.9 | learning rate: 1.516E-05 | global batch size:    32 | lm loss: 6.543282E+00 | loss scale: 16384.0 | grad norm: 56916.506 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3187/  159576 | consumed samples:        54736 | elapsed time per iteration (ms): 14594.3 | learning rate: 1.516E-05 | global batch size:    32 | lm loss: 6.619707E+00 | loss scale: 16384.0 | grad norm: 87429.373 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3188/  159576 | consumed samples:        54768 | elapsed time per iteration (ms): 14717.0 | learning rate: 1.517E-05 | global batch size:    32 | lm loss: 6.575029E+00 | loss scale: 16384.0 | grad norm: 63063.172 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3189/  159576 | consumed samples:        54800 | elapsed time per iteration (ms): 14535.7 | learning rate: 1.518E-05 | global batch size:    32 | lm loss: 6.572168E+00 | loss scale: 16384.0 | grad norm: 85759.591 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3190/  159576 | consumed samples:        54832 | elapsed time per iteration (ms): 14535.8 | learning rate: 1.519E-05 | global batch size:    32 | lm loss: 6.540303E+00 | loss scale: 16384.0 | grad norm: 59464.367 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3191/  159576 | consumed samples:        54864 | elapsed time per iteration (ms): 14477.2 | learning rate: 1.520E-05 | global batch size:    32 | lm loss: 6.545095E+00 | loss scale: 16384.0 | grad norm: 53870.876 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3192/  159576 | consumed samples:        54896 | elapsed time per iteration (ms): 14651.8 | learning rate: 1.521E-05 | global batch size:    32 | lm loss: 6.497169E+00 | loss scale: 16384.0 | grad norm: 50516.018 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3193/  159576 | consumed samples:        54928 | elapsed time per iteration (ms): 14555.7 | learning rate: 1.522E-05 | global batch size:    32 | lm loss: 6.354692E+00 | loss scale: 16384.0 | grad norm: 67216.716 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3194/  159576 | consumed samples:        54960 | elapsed time per iteration (ms): 14548.6 | learning rate: 1.523E-05 | global batch size:    32 | lm loss: 6.704625E+00 | loss scale: 16384.0 | grad norm: 64544.434 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3195/  159576 | consumed samples:        54992 | elapsed time per iteration (ms): 14549.1 | learning rate: 1.524E-05 | global batch size:    32 | lm loss: 6.489696E+00 | loss scale: 16384.0 | grad norm: 43746.021 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3196/  159576 | consumed samples:        55024 | elapsed time per iteration (ms): 14783.1 | learning rate: 1.524E-05 | global batch size:    32 | lm loss: 6.496898E+00 | loss scale: 16384.0 | grad norm: 146573.122 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3197/  159576 | consumed samples:        55056 | elapsed time per iteration (ms): 14527.9 | learning rate: 1.525E-05 | global batch size:    32 | lm loss: 6.568567E+00 | loss scale: 16384.0 | grad norm: 78804.650 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3198/  159576 | consumed samples:        55088 | elapsed time per iteration (ms): 14523.2 | learning rate: 1.526E-05 | global batch size:    32 | lm loss: 6.598960E+00 | loss scale: 16384.0 | grad norm: 96783.060 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3199/  159576 | consumed samples:        55120 | elapsed time per iteration (ms): 14540.7 | learning rate: 1.527E-05 | global batch size:    32 | lm loss: 6.572606E+00 | loss scale: 16384.0 | grad norm: 89417.690 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3200/  159576 | consumed samples:        55152 | elapsed time per iteration (ms): 15008.9 | learning rate: 1.528E-05 | global batch size:    32 | lm loss: 6.506562E+00 | loss scale: 16384.0 | grad norm: 41993.325 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3201/  159576 | consumed samples:        55184 | elapsed time per iteration (ms): 14658.0 | learning rate: 1.529E-05 | global batch size:    32 | lm loss: 6.782739E+00 | loss scale: 16384.0 | grad norm: 352113.189 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3202/  159576 | consumed samples:        55216 | elapsed time per iteration (ms): 14567.2 | learning rate: 1.530E-05 | global batch size:    32 | lm loss: 6.567737E+00 | loss scale: 16384.0 | grad norm: 255563.854 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3203/  159576 | consumed samples:        55248 | elapsed time per iteration (ms): 14521.2 | learning rate: 1.531E-05 | global batch size:    32 | lm loss: 6.758952E+00 | loss scale: 16384.0 | grad norm: 132639.437 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3204/  159576 | consumed samples:        55280 | elapsed time per iteration (ms): 15057.0 | learning rate: 1.532E-05 | global batch size:    32 | lm loss: 6.644050E+00 | loss scale: 16384.0 | grad norm: 95206.523 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3205/  159576 | consumed samples:        55312 | elapsed time per iteration (ms): 14632.3 | learning rate: 1.532E-05 | global batch size:    32 | lm loss: 6.559070E+00 | loss scale: 16384.0 | grad norm: 92448.552 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3206/  159576 | consumed samples:        55344 | elapsed time per iteration (ms): 14560.7 | learning rate: 1.533E-05 | global batch size:    32 | lm loss: 6.544364E+00 | loss scale: 16384.0 | grad norm: 87185.641 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3207/  159576 | consumed samples:        55376 | elapsed time per iteration (ms): 14559.6 | learning rate: 1.534E-05 | global batch size:    32 | lm loss: 6.617725E+00 | loss scale: 16384.0 | grad norm: 147534.405 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3208/  159576 | consumed samples:        55408 | elapsed time per iteration (ms): 14919.1 | learning rate: 1.535E-05 | global batch size:    32 | lm loss: 6.505226E+00 | loss scale: 16384.0 | grad norm: 82317.664 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3209/  159576 | consumed samples:        55440 | elapsed time per iteration (ms): 14628.9 | learning rate: 1.536E-05 | global batch size:    32 | lm loss: 6.529959E+00 | loss scale: 16384.0 | grad norm: 62063.357 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3210/  159576 | consumed samples:        55472 | elapsed time per iteration (ms): 14562.8 | learning rate: 1.537E-05 | global batch size:    32 | lm loss: 6.499523E+00 | loss scale: 16384.0 | grad norm: 59027.974 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3211/  159576 | consumed samples:        55504 | elapsed time per iteration (ms): 14551.3 | learning rate: 1.538E-05 | global batch size:    32 | lm loss: 6.612097E+00 | loss scale: 16384.0 | grad norm: 142076.623 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3212/  159576 | consumed samples:        55536 | elapsed time per iteration (ms): 14906.9 | learning rate: 1.539E-05 | global batch size:    32 | lm loss: 6.726549E+00 | loss scale: 16384.0 | grad norm: 85971.039 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3213/  159576 | consumed samples:        55568 | elapsed time per iteration (ms): 14484.4 | learning rate: 1.539E-05 | global batch size:    32 | lm loss: 6.627134E+00 | loss scale: 16384.0 | grad norm: 74784.069 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3214/  159576 | consumed samples:        55600 | elapsed time per iteration (ms): 14568.5 | learning rate: 1.540E-05 | global batch size:    32 | lm loss: 6.684568E+00 | loss scale: 16384.0 | grad norm: 85537.156 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3215/  159576 | consumed samples:        55632 | elapsed time per iteration (ms): 14541.7 | learning rate: 1.541E-05 | global batch size:    32 | lm loss: 6.632449E+00 | loss scale: 16384.0 | grad norm: 118554.262 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3216/  159576 | consumed samples:        55664 | elapsed time per iteration (ms): 14903.9 | learning rate: 1.542E-05 | global batch size:    32 | lm loss: 6.491426E+00 | loss scale: 16384.0 | grad norm: 66361.502 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3217/  159576 | consumed samples:        55696 | elapsed time per iteration (ms): 14654.1 | learning rate: 1.543E-05 | global batch size:    32 | lm loss: 6.599683E+00 | loss scale: 16384.0 | grad norm: 66284.456 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3218/  159576 | consumed samples:        55728 | elapsed time per iteration (ms): 14564.4 | learning rate: 1.544E-05 | global batch size:    32 | lm loss: 6.671634E+00 | loss scale: 16384.0 | grad norm: 48626.750 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3219/  159576 | consumed samples:        55760 | elapsed time per iteration (ms): 14567.8 | learning rate: 1.545E-05 | global batch size:    32 | lm loss: 6.653804E+00 | loss scale: 16384.0 | grad norm: 84407.596 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3220/  159576 | consumed samples:        55792 | elapsed time per iteration (ms): 14939.3 | learning rate: 1.546E-05 | global batch size:    32 | lm loss: 6.519379E+00 | loss scale: 16384.0 | grad norm: 72885.533 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3221/  159576 | consumed samples:        55824 | elapsed time per iteration (ms): 14579.8 | learning rate: 1.547E-05 | global batch size:    32 | lm loss: 6.658468E+00 | loss scale: 16384.0 | grad norm: 69063.419 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3222/  159576 | consumed samples:        55856 | elapsed time per iteration (ms): 14568.3 | learning rate: 1.547E-05 | global batch size:    32 | lm loss: 6.544227E+00 | loss scale: 16384.0 | grad norm: 94167.013 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3223/  159576 | consumed samples:        55888 | elapsed time per iteration (ms): 14530.3 | learning rate: 1.548E-05 | global batch size:    32 | lm loss: 6.519998E+00 | loss scale: 16384.0 | grad norm: 74630.691 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3224/  159576 | consumed samples:        55920 | elapsed time per iteration (ms): 14849.7 | learning rate: 1.549E-05 | global batch size:    32 | lm loss: 6.586551E+00 | loss scale: 16384.0 | grad norm: 76630.181 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3225/  159576 | consumed samples:        55952 | elapsed time per iteration (ms): 14888.8 | learning rate: 1.550E-05 | global batch size:    32 | lm loss: 6.687891E+00 | loss scale: 16384.0 | grad norm: 70630.932 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3226/  159576 | consumed samples:        55984 | elapsed time per iteration (ms): 14540.3 | learning rate: 1.551E-05 | global batch size:    32 | lm loss: 6.595382E+00 | loss scale: 16384.0 | grad norm: 92178.351 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3227/  159576 | consumed samples:        56016 | elapsed time per iteration (ms): 14557.7 | learning rate: 1.552E-05 | global batch size:    32 | lm loss: 6.364616E+00 | loss scale: 16384.0 | grad norm: 62395.737 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3228/  159576 | consumed samples:        56048 | elapsed time per iteration (ms): 14547.2 | learning rate: 1.553E-05 | global batch size:    32 | lm loss: 6.614971E+00 | loss scale: 16384.0 | grad norm: 72348.132 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3229/  159576 | consumed samples:        56080 | elapsed time per iteration (ms): 14765.8 | learning rate: 1.554E-05 | global batch size:    32 | lm loss: 6.527470E+00 | loss scale: 16384.0 | grad norm: 70068.847 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3230/  159576 | consumed samples:        56112 | elapsed time per iteration (ms): 14547.7 | learning rate: 1.555E-05 | global batch size:    32 | lm loss: 6.691795E+00 | loss scale: 16384.0 | grad norm: 79540.792 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3231/  159576 | consumed samples:        56144 | elapsed time per iteration (ms): 14659.9 | learning rate: 1.555E-05 | global batch size:    32 | lm loss: 6.541613E+00 | loss scale: 16384.0 | grad norm: 49841.975 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3232/  159576 | consumed samples:        56176 | elapsed time per iteration (ms): 14501.9 | learning rate: 1.556E-05 | global batch size:    32 | lm loss: 6.634310E+00 | loss scale: 16384.0 | grad norm: 67541.885 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3233/  159576 | consumed samples:        56208 | elapsed time per iteration (ms): 14751.5 | learning rate: 1.557E-05 | global batch size:    32 | lm loss: 6.538262E+00 | loss scale: 16384.0 | grad norm: 60234.071 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3234/  159576 | consumed samples:        56240 | elapsed time per iteration (ms): 14540.9 | learning rate: 1.558E-05 | global batch size:    32 | lm loss: 6.572741E+00 | loss scale: 16384.0 | grad norm: 51996.631 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3235/  159576 | consumed samples:        56272 | elapsed time per iteration (ms): 14525.6 | learning rate: 1.559E-05 | global batch size:    32 | lm loss: 6.514688E+00 | loss scale: 16384.0 | grad norm: 80129.382 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3236/  159576 | consumed samples:        56304 | elapsed time per iteration (ms): 14525.2 | learning rate: 1.560E-05 | global batch size:    32 | lm loss: 6.597489E+00 | loss scale: 16384.0 | grad norm: 106848.471 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3237/  159576 | consumed samples:        56336 | elapsed time per iteration (ms): 14776.9 | learning rate: 1.561E-05 | global batch size:    32 | lm loss: 6.556981E+00 | loss scale: 16384.0 | grad norm: 71439.752 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3238/  159576 | consumed samples:        56368 | elapsed time per iteration (ms): 14561.5 | learning rate: 1.562E-05 | global batch size:    32 | lm loss: 6.569613E+00 | loss scale: 16384.0 | grad norm: 70525.236 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3239/  159576 | consumed samples:        56400 | elapsed time per iteration (ms): 14478.4 | learning rate: 1.563E-05 | global batch size:    32 | lm loss: 6.541091E+00 | loss scale: 16384.0 | grad norm: 47017.630 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3240/  159576 | consumed samples:        56432 | elapsed time per iteration (ms): 14587.1 | learning rate: 1.563E-05 | global batch size:    32 | lm loss: 6.697134E+00 | loss scale: 16384.0 | grad norm: 53866.228 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3241/  159576 | consumed samples:        56464 | elapsed time per iteration (ms): 14901.2 | learning rate: 1.564E-05 | global batch size:    32 | lm loss: 6.463998E+00 | loss scale: 16384.0 | grad norm: 72517.961 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3242/  159576 | consumed samples:        56496 | elapsed time per iteration (ms): 14602.2 | learning rate: 1.565E-05 | global batch size:    32 | lm loss: 6.557918E+00 | loss scale: 16384.0 | grad norm: 51986.356 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3243/  159576 | consumed samples:        56528 | elapsed time per iteration (ms): 14553.6 | learning rate: 1.566E-05 | global batch size:    32 | lm loss: 6.491773E+00 | loss scale: 16384.0 | grad norm: 68222.486 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3244/  159576 | consumed samples:        56560 | elapsed time per iteration (ms): 14559.7 | learning rate: 1.567E-05 | global batch size:    32 | lm loss: 6.590208E+00 | loss scale: 16384.0 | grad norm: 72691.281 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3245/  159576 | consumed samples:        56592 | elapsed time per iteration (ms): 14894.6 | learning rate: 1.568E-05 | global batch size:    32 | lm loss: 6.551069E+00 | loss scale: 16384.0 | grad norm: 71227.557 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3246/  159576 | consumed samples:        56624 | elapsed time per iteration (ms): 14706.4 | learning rate: 1.569E-05 | global batch size:    32 | lm loss: 6.536276E+00 | loss scale: 16384.0 | grad norm: 77853.983 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3247/  159576 | consumed samples:        56656 | elapsed time per iteration (ms): 14557.1 | learning rate: 1.570E-05 | global batch size:    32 | lm loss: 6.547366E+00 | loss scale: 16384.0 | grad norm: 91853.496 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3248/  159576 | consumed samples:        56688 | elapsed time per iteration (ms): 14512.9 | learning rate: 1.571E-05 | global batch size:    32 | lm loss: 6.604490E+00 | loss scale: 16384.0 | grad norm: 61725.711 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3249/  159576 | consumed samples:        56720 | elapsed time per iteration (ms): 14949.1 | learning rate: 1.571E-05 | global batch size:    32 | lm loss: 6.555557E+00 | loss scale: 16384.0 | grad norm: 55414.359 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3250/  159576 | consumed samples:        56752 | elapsed time per iteration (ms): 14468.6 | learning rate: 1.572E-05 | global batch size:    32 | lm loss: 6.471034E+00 | loss scale: 16384.0 | grad norm: 39264.272 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3251/  159576 | consumed samples:        56784 | elapsed time per iteration (ms): 14601.9 | learning rate: 1.573E-05 | global batch size:    32 | lm loss: 6.472137E+00 | loss scale: 16384.0 | grad norm: 51720.854 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3252/  159576 | consumed samples:        56816 | elapsed time per iteration (ms): 14481.3 | learning rate: 1.574E-05 | global batch size:    32 | lm loss: 6.564797E+00 | loss scale: 16384.0 | grad norm: 55129.631 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3253/  159576 | consumed samples:        56848 | elapsed time per iteration (ms): 14865.7 | learning rate: 1.575E-05 | global batch size:    32 | lm loss: 6.433147E+00 | loss scale: 16384.0 | grad norm: 48761.095 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3254/  159576 | consumed samples:        56880 | elapsed time per iteration (ms): 14607.7 | learning rate: 1.576E-05 | global batch size:    32 | lm loss: 6.486347E+00 | loss scale: 16384.0 | grad norm: 51447.567 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3255/  159576 | consumed samples:        56912 | elapsed time per iteration (ms): 14476.2 | learning rate: 1.577E-05 | global batch size:    32 | lm loss: 6.670080E+00 | loss scale: 16384.0 | grad norm: 49692.027 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3256/  159576 | consumed samples:        56944 | elapsed time per iteration (ms): 14532.2 | learning rate: 1.578E-05 | global batch size:    32 | lm loss: 6.449496E+00 | loss scale: 16384.0 | grad norm: 46597.035 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3257/  159576 | consumed samples:        56976 | elapsed time per iteration (ms): 14907.4 | learning rate: 1.579E-05 | global batch size:    32 | lm loss: 6.651023E+00 | loss scale: 16384.0 | grad norm: 50509.142 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3258/  159576 | consumed samples:        57008 | elapsed time per iteration (ms): 14521.0 | learning rate: 1.579E-05 | global batch size:    32 | lm loss: 6.557060E+00 | loss scale: 16384.0 | grad norm: 46431.742 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3259/  159576 | consumed samples:        57040 | elapsed time per iteration (ms): 14527.8 | learning rate: 1.580E-05 | global batch size:    32 | lm loss: 6.802115E+00 | loss scale: 16384.0 | grad norm: 46019.500 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3260/  159576 | consumed samples:        57072 | elapsed time per iteration (ms): 14560.3 | learning rate: 1.581E-05 | global batch size:    32 | lm loss: 6.480462E+00 | loss scale: 16384.0 | grad norm: 54023.847 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3261/  159576 | consumed samples:        57104 | elapsed time per iteration (ms): 14898.0 | learning rate: 1.582E-05 | global batch size:    32 | lm loss: 6.696016E+00 | loss scale: 16384.0 | grad norm: 51541.626 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3262/  159576 | consumed samples:        57136 | elapsed time per iteration (ms): 14574.6 | learning rate: 1.583E-05 | global batch size:    32 | lm loss: 6.633371E+00 | loss scale: 16384.0 | grad norm: 64314.799 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3263/  159576 | consumed samples:        57168 | elapsed time per iteration (ms): 14524.2 | learning rate: 1.584E-05 | global batch size:    32 | lm loss: 6.540409E+00 | loss scale: 16384.0 | grad norm: 53098.615 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3264/  159576 | consumed samples:        57200 | elapsed time per iteration (ms): 14557.6 | learning rate: 1.585E-05 | global batch size:    32 | lm loss: 6.376970E+00 | loss scale: 32768.0 | grad norm: 75107.502 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3265/  159576 | consumed samples:        57232 | elapsed time per iteration (ms): 14784.4 | learning rate: 1.586E-05 | global batch size:    32 | lm loss: 6.602743E+00 | loss scale: 32768.0 | grad norm: 125297.978 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3266/  159576 | consumed samples:        57264 | elapsed time per iteration (ms): 14634.8 | learning rate: 1.587E-05 | global batch size:    32 | lm loss: 6.514446E+00 | loss scale: 32768.0 | grad norm: 194672.983 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3267/  159576 | consumed samples:        57296 | elapsed time per iteration (ms): 14570.9 | learning rate: 1.587E-05 | global batch size:    32 | lm loss: 6.630837E+00 | loss scale: 32768.0 | grad norm: 107205.101 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3268/  159576 | consumed samples:        57328 | elapsed time per iteration (ms): 14454.1 | learning rate: 1.588E-05 | global batch size:    32 | lm loss: 6.541512E+00 | loss scale: 32768.0 | grad norm: 112309.244 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3269/  159576 | consumed samples:        57360 | elapsed time per iteration (ms): 14551.3 | learning rate: 1.589E-05 | global batch size:    32 | lm loss: 6.542883E+00 | loss scale: 32768.0 | grad norm: 132672.039 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3270/  159576 | consumed samples:        57392 | elapsed time per iteration (ms): 14718.7 | learning rate: 1.590E-05 | global batch size:    32 | lm loss: 6.448256E+00 | loss scale: 32768.0 | grad norm: 151950.192 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3271/  159576 | consumed samples:        57424 | elapsed time per iteration (ms): 14527.0 | learning rate: 1.591E-05 | global batch size:    32 | lm loss: 6.688755E+00 | loss scale: 32768.0 | grad norm: 91675.285 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3272/  159576 | consumed samples:        57456 | elapsed time per iteration (ms): 14559.6 | learning rate: 1.592E-05 | global batch size:    32 | lm loss: 6.550324E+00 | loss scale: 32768.0 | grad norm: 241437.766 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3273/  159576 | consumed samples:        57488 | elapsed time per iteration (ms): 14521.4 | learning rate: 1.593E-05 | global batch size:    32 | lm loss: 6.620804E+00 | loss scale: 32768.0 | grad norm: 130842.612 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3274/  159576 | consumed samples:        57520 | elapsed time per iteration (ms): 14697.5 | learning rate: 1.594E-05 | global batch size:    32 | lm loss: 6.459725E+00 | loss scale: 32768.0 | grad norm: 146465.533 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3275/  159576 | consumed samples:        57552 | elapsed time per iteration (ms): 14476.2 | learning rate: 1.595E-05 | global batch size:    32 | lm loss: 6.576751E+00 | loss scale: 32768.0 | grad norm: 114711.531 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3276/  159576 | consumed samples:        57584 | elapsed time per iteration (ms): 14512.4 | learning rate: 1.595E-05 | global batch size:    32 | lm loss: 6.599717E+00 | loss scale: 32768.0 | grad norm: 283220.301 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3277/  159576 | consumed samples:        57616 | elapsed time per iteration (ms): 14565.0 | learning rate: 1.596E-05 | global batch size:    32 | lm loss: 6.395351E+00 | loss scale: 32768.0 | grad norm: 206105.392 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3278/  159576 | consumed samples:        57648 | elapsed time per iteration (ms): 14816.8 | learning rate: 1.597E-05 | global batch size:    32 | lm loss: 6.569580E+00 | loss scale: 32768.0 | grad norm: 183586.115 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3279/  159576 | consumed samples:        57680 | elapsed time per iteration (ms): 14615.5 | learning rate: 1.598E-05 | global batch size:    32 | lm loss: 6.572281E+00 | loss scale: 32768.0 | grad norm: 161878.009 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3280/  159576 | consumed samples:        57712 | elapsed time per iteration (ms): 14521.1 | learning rate: 1.599E-05 | global batch size:    32 | lm loss: 6.513469E+00 | loss scale: 32768.0 | grad norm: 134922.801 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3281/  159576 | consumed samples:        57744 | elapsed time per iteration (ms): 14549.6 | learning rate: 1.600E-05 | global batch size:    32 | lm loss: 6.680450E+00 | loss scale: 32768.0 | grad norm: 214593.393 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3282/  159576 | consumed samples:        57776 | elapsed time per iteration (ms): 14885.6 | learning rate: 1.601E-05 | global batch size:    32 | lm loss: 6.528894E+00 | loss scale: 32768.0 | grad norm: 136120.139 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3283/  159576 | consumed samples:        57808 | elapsed time per iteration (ms): 14648.1 | learning rate: 1.602E-05 | global batch size:    32 | lm loss: 6.610715E+00 | loss scale: 32768.0 | grad norm: 124689.254 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3284/  159576 | consumed samples:        57840 | elapsed time per iteration (ms): 14446.0 | learning rate: 1.603E-05 | global batch size:    32 | lm loss: 6.493599E+00 | loss scale: 32768.0 | grad norm: 193703.601 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3285/  159576 | consumed samples:        57872 | elapsed time per iteration (ms): 14530.4 | learning rate: 1.603E-05 | global batch size:    32 | lm loss: 6.495665E+00 | loss scale: 32768.0 | grad norm: 180680.235 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3286/  159576 | consumed samples:        57904 | elapsed time per iteration (ms): 15079.8 | learning rate: 1.604E-05 | global batch size:    32 | lm loss: 6.484368E+00 | loss scale: 32768.0 | grad norm: 151352.752 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3287/  159576 | consumed samples:        57936 | elapsed time per iteration (ms): 14519.7 | learning rate: 1.605E-05 | global batch size:    32 | lm loss: 6.533234E+00 | loss scale: 32768.0 | grad norm: 135972.403 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3288/  159576 | consumed samples:        57968 | elapsed time per iteration (ms): 14502.1 | learning rate: 1.606E-05 | global batch size:    32 | lm loss: 6.485931E+00 | loss scale: 32768.0 | grad norm: 175469.781 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3289/  159576 | consumed samples:        58000 | elapsed time per iteration (ms): 14650.6 | learning rate: 1.607E-05 | global batch size:    32 | lm loss: 6.588792E+00 | loss scale: 32768.0 | grad norm: 95804.558 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3290/  159576 | consumed samples:        58032 | elapsed time per iteration (ms): 15011.0 | learning rate: 1.608E-05 | global batch size:    32 | lm loss: 6.649066E+00 | loss scale: 32768.0 | grad norm: 158912.035 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3291/  159576 | consumed samples:        58064 | elapsed time per iteration (ms): 14545.2 | learning rate: 1.609E-05 | global batch size:    32 | lm loss: 6.518328E+00 | loss scale: 32768.0 | grad norm: 143118.125 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3292/  159576 | consumed samples:        58096 | elapsed time per iteration (ms): 14548.9 | learning rate: 1.610E-05 | global batch size:    32 | lm loss: 6.497085E+00 | loss scale: 32768.0 | grad norm: 242609.168 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3293/  159576 | consumed samples:        58128 | elapsed time per iteration (ms): 14674.4 | learning rate: 1.611E-05 | global batch size:    32 | lm loss: 6.516074E+00 | loss scale: 32768.0 | grad norm: 230563.177 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3294/  159576 | consumed samples:        58160 | elapsed time per iteration (ms): 15018.5 | learning rate: 1.611E-05 | global batch size:    32 | lm loss: 6.357250E+00 | loss scale: 32768.0 | grad norm: 145279.947 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3295/  159576 | consumed samples:        58192 | elapsed time per iteration (ms): 14502.4 | learning rate: 1.612E-05 | global batch size:    32 | lm loss: 6.532835E+00 | loss scale: 32768.0 | grad norm: 159209.410 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3296/  159576 | consumed samples:        58224 | elapsed time per iteration (ms): 14618.1 | learning rate: 1.613E-05 | global batch size:    32 | lm loss: 6.610238E+00 | loss scale: 32768.0 | grad norm: 103662.453 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3297/  159576 | consumed samples:        58256 | elapsed time per iteration (ms): 14641.0 | learning rate: 1.614E-05 | global batch size:    32 | lm loss: 6.559636E+00 | loss scale: 32768.0 | grad norm: 342247.705 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3298/  159576 | consumed samples:        58288 | elapsed time per iteration (ms): 14987.0 | learning rate: 1.615E-05 | global batch size:    32 | lm loss: 6.595356E+00 | loss scale: 32768.0 | grad norm: 185444.091 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3299/  159576 | consumed samples:        58320 | elapsed time per iteration (ms): 14547.8 | learning rate: 1.616E-05 | global batch size:    32 | lm loss: 6.538537E+00 | loss scale: 32768.0 | grad norm: 145127.777 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3300/  159576 | consumed samples:        58352 | elapsed time per iteration (ms): 14643.9 | learning rate: 1.617E-05 | global batch size:    32 | lm loss: 6.453721E+00 | loss scale: 32768.0 | grad norm: 235646.726 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3301/  159576 | consumed samples:        58384 | elapsed time per iteration (ms): 14648.1 | learning rate: 1.618E-05 | global batch size:    32 | lm loss: 6.672456E+00 | loss scale: 32768.0 | grad norm: 131805.014 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3302/  159576 | consumed samples:        58416 | elapsed time per iteration (ms): 15043.8 | learning rate: 1.618E-05 | global batch size:    32 | lm loss: 6.513996E+00 | loss scale: 32768.0 | grad norm: 172559.158 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3303/  159576 | consumed samples:        58448 | elapsed time per iteration (ms): 14557.7 | learning rate: 1.619E-05 | global batch size:    32 | lm loss: 6.688443E+00 | loss scale: 32768.0 | grad norm: 154181.843 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3304/  159576 | consumed samples:        58480 | elapsed time per iteration (ms): 14541.6 | learning rate: 1.620E-05 | global batch size:    32 | lm loss: 6.865191E+00 | loss scale: 32768.0 | grad norm: 171141.876 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3305/  159576 | consumed samples:        58512 | elapsed time per iteration (ms): 14558.8 | learning rate: 1.621E-05 | global batch size:    32 | lm loss: 6.529626E+00 | loss scale: 32768.0 | grad norm: 112641.720 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3306/  159576 | consumed samples:        58544 | elapsed time per iteration (ms): 14971.5 | learning rate: 1.622E-05 | global batch size:    32 | lm loss: 6.571610E+00 | loss scale: 32768.0 | grad norm: 115411.324 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3307/  159576 | consumed samples:        58576 | elapsed time per iteration (ms): 14532.6 | learning rate: 1.623E-05 | global batch size:    32 | lm loss: 6.792900E+00 | loss scale: 32768.0 | grad norm: 153224.669 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3308/  159576 | consumed samples:        58608 | elapsed time per iteration (ms): 14639.5 | learning rate: 1.624E-05 | global batch size:    32 | lm loss: 6.490854E+00 | loss scale: 32768.0 | grad norm: 125276.183 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3309/  159576 | consumed samples:        58640 | elapsed time per iteration (ms): 14639.4 | learning rate: 1.625E-05 | global batch size:    32 | lm loss: 6.604795E+00 | loss scale: 32768.0 | grad norm: 163307.330 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3310/  159576 | consumed samples:        58672 | elapsed time per iteration (ms): 14641.3 | learning rate: 1.626E-05 | global batch size:    32 | lm loss: 6.486001E+00 | loss scale: 32768.0 | grad norm: 169732.209 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3311/  159576 | consumed samples:        58704 | elapsed time per iteration (ms): 14763.3 | learning rate: 1.626E-05 | global batch size:    32 | lm loss: 6.513995E+00 | loss scale: 32768.0 | grad norm: 106129.577 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3312/  159576 | consumed samples:        58736 | elapsed time per iteration (ms): 14481.4 | learning rate: 1.627E-05 | global batch size:    32 | lm loss: 6.538834E+00 | loss scale: 32768.0 | grad norm: 143827.047 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3313/  159576 | consumed samples:        58768 | elapsed time per iteration (ms): 14535.0 | learning rate: 1.628E-05 | global batch size:    32 | lm loss: 6.508898E+00 | loss scale: 32768.0 | grad norm: 96517.736 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3314/  159576 | consumed samples:        58800 | elapsed time per iteration (ms): 14389.3 | learning rate: 1.629E-05 | global batch size:    32 | lm loss: 6.557344E+00 | loss scale: 32768.0 | grad norm: 160647.640 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3315/  159576 | consumed samples:        58832 | elapsed time per iteration (ms): 14617.9 | learning rate: 1.630E-05 | global batch size:    32 | lm loss: 6.579730E+00 | loss scale: 32768.0 | grad norm: 166511.304 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3316/  159576 | consumed samples:        58864 | elapsed time per iteration (ms): 14527.6 | learning rate: 1.631E-05 | global batch size:    32 | lm loss: 6.510201E+00 | loss scale: 32768.0 | grad norm: 147882.179 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3317/  159576 | consumed samples:        58896 | elapsed time per iteration (ms): 14470.3 | learning rate: 1.632E-05 | global batch size:    32 | lm loss: 6.570679E+00 | loss scale: 32768.0 | grad norm: 133948.873 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3318/  159576 | consumed samples:        58928 | elapsed time per iteration (ms): 14503.9 | learning rate: 1.633E-05 | global batch size:    32 | lm loss: 6.505450E+00 | loss scale: 32768.0 | grad norm: 117987.444 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3319/  159576 | consumed samples:        58960 | elapsed time per iteration (ms): 14576.7 | learning rate: 1.634E-05 | global batch size:    32 | lm loss: 6.637349E+00 | loss scale: 32768.0 | grad norm: 158753.005 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3320/  159576 | consumed samples:        58992 | elapsed time per iteration (ms): 14474.5 | learning rate: 1.634E-05 | global batch size:    32 | lm loss: 6.463197E+00 | loss scale: 32768.0 | grad norm: 133223.814 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3321/  159576 | consumed samples:        59024 | elapsed time per iteration (ms): 14495.2 | learning rate: 1.635E-05 | global batch size:    32 | lm loss: 6.754025E+00 | loss scale: 32768.0 | grad norm: 147882.857 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3322/  159576 | consumed samples:        59056 | elapsed time per iteration (ms): 14426.8 | learning rate: 1.636E-05 | global batch size:    32 | lm loss: 6.377756E+00 | loss scale: 32768.0 | grad norm: 107176.477 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3323/  159576 | consumed samples:        59088 | elapsed time per iteration (ms): 14894.2 | learning rate: 1.637E-05 | global batch size:    32 | lm loss: 6.485399E+00 | loss scale: 32768.0 | grad norm: 104276.979 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3324/  159576 | consumed samples:        59120 | elapsed time per iteration (ms): 14539.8 | learning rate: 1.638E-05 | global batch size:    32 | lm loss: 6.595620E+00 | loss scale: 32768.0 | grad norm: 102253.453 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3325/  159576 | consumed samples:        59152 | elapsed time per iteration (ms): 14528.7 | learning rate: 1.639E-05 | global batch size:    32 | lm loss: 6.372971E+00 | loss scale: 32768.0 | grad norm: 170203.107 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3326/  159576 | consumed samples:        59184 | elapsed time per iteration (ms): 14629.3 | learning rate: 1.640E-05 | global batch size:    32 | lm loss: 6.460327E+00 | loss scale: 32768.0 | grad norm: 108888.403 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3327/  159576 | consumed samples:        59216 | elapsed time per iteration (ms): 15011.9 | learning rate: 1.641E-05 | global batch size:    32 | lm loss: 6.462082E+00 | loss scale: 32768.0 | grad norm: 154915.863 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3328/  159576 | consumed samples:        59248 | elapsed time per iteration (ms): 14457.0 | learning rate: 1.642E-05 | global batch size:    32 | lm loss: 6.526529E+00 | loss scale: 32768.0 | grad norm: 135486.854 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3329/  159576 | consumed samples:        59280 | elapsed time per iteration (ms): 14493.0 | learning rate: 1.642E-05 | global batch size:    32 | lm loss: 6.546029E+00 | loss scale: 32768.0 | grad norm: 97252.237 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3330/  159576 | consumed samples:        59312 | elapsed time per iteration (ms): 14488.7 | learning rate: 1.643E-05 | global batch size:    32 | lm loss: 6.540400E+00 | loss scale: 32768.0 | grad norm: 234564.204 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3331/  159576 | consumed samples:        59344 | elapsed time per iteration (ms): 14982.7 | learning rate: 1.644E-05 | global batch size:    32 | lm loss: 6.473689E+00 | loss scale: 32768.0 | grad norm: 104411.499 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3332/  159576 | consumed samples:        59376 | elapsed time per iteration (ms): 14455.1 | learning rate: 1.645E-05 | global batch size:    32 | lm loss: 6.589927E+00 | loss scale: 32768.0 | grad norm: 240696.234 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-24 15:06:48] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1165978_[1-10%1] on 'gpu_p13' partition)
[2021-09-24 15:06:48] PULSE: tr8-104B is running for 9:14:37 since 2021-09-24T05:52:11 (1162855_1 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8])
 iteration     3333/  159576 | consumed samples:        59408 | elapsed time per iteration (ms): 14571.6 | learning rate: 1.646E-05 | global batch size:    32 | lm loss: 6.604051E+00 | loss scale: 32768.0 | grad norm: 150869.363 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3334/  159576 | consumed samples:        59440 | elapsed time per iteration (ms): 14495.5 | learning rate: 1.647E-05 | global batch size:    32 | lm loss: 6.565775E+00 | loss scale: 32768.0 | grad norm: 141203.105 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3335/  159576 | consumed samples:        59472 | elapsed time per iteration (ms): 14896.4 | learning rate: 1.648E-05 | global batch size:    32 | lm loss: 6.456505E+00 | loss scale: 32768.0 | grad norm: 145244.969 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3336/  159576 | consumed samples:        59504 | elapsed time per iteration (ms): 14515.3 | learning rate: 1.649E-05 | global batch size:    32 | lm loss: 6.488969E+00 | loss scale: 32768.0 | grad norm: 246097.062 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3337/  159576 | consumed samples:        59536 | elapsed time per iteration (ms): 14492.7 | learning rate: 1.650E-05 | global batch size:    32 | lm loss: 6.455498E+00 | loss scale: 32768.0 | grad norm: 130955.193 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3338/  159576 | consumed samples:        59568 | elapsed time per iteration (ms): 14531.1 | learning rate: 1.650E-05 | global batch size:    32 | lm loss: 6.593586E+00 | loss scale: 32768.0 | grad norm: 136721.801 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3339/  159576 | consumed samples:        59600 | elapsed time per iteration (ms): 14962.3 | learning rate: 1.651E-05 | global batch size:    32 | lm loss: 6.564628E+00 | loss scale: 32768.0 | grad norm: 141976.785 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3340/  159576 | consumed samples:        59632 | elapsed time per iteration (ms): 14550.8 | learning rate: 1.652E-05 | global batch size:    32 | lm loss: 6.373518E+00 | loss scale: 32768.0 | grad norm: 113008.324 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3341/  159576 | consumed samples:        59664 | elapsed time per iteration (ms): 14563.2 | learning rate: 1.653E-05 | global batch size:    32 | lm loss: 6.658302E+00 | loss scale: 32768.0 | grad norm: 113653.366 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3342/  159576 | consumed samples:        59696 | elapsed time per iteration (ms): 14584.3 | learning rate: 1.654E-05 | global batch size:    32 | lm loss: 6.485311E+00 | loss scale: 32768.0 | grad norm: 162130.401 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3343/  159576 | consumed samples:        59728 | elapsed time per iteration (ms): 14879.0 | learning rate: 1.655E-05 | global batch size:    32 | lm loss: 6.461338E+00 | loss scale: 32768.0 | grad norm: 284392.029 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3344/  159576 | consumed samples:        59760 | elapsed time per iteration (ms): 14679.3 | learning rate: 1.656E-05 | global batch size:    32 | lm loss: 6.473630E+00 | loss scale: 32768.0 | grad norm: 142043.769 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3345/  159576 | consumed samples:        59792 | elapsed time per iteration (ms): 14580.5 | learning rate: 1.657E-05 | global batch size:    32 | lm loss: 6.494667E+00 | loss scale: 32768.0 | grad norm: 125366.936 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3346/  159576 | consumed samples:        59824 | elapsed time per iteration (ms): 14552.3 | learning rate: 1.658E-05 | global batch size:    32 | lm loss: 6.560155E+00 | loss scale: 32768.0 | grad norm: 126654.040 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3347/  159576 | consumed samples:        59856 | elapsed time per iteration (ms): 14707.5 | learning rate: 1.658E-05 | global batch size:    32 | lm loss: 6.462931E+00 | loss scale: 32768.0 | grad norm: 123122.209 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3348/  159576 | consumed samples:        59888 | elapsed time per iteration (ms): 14897.9 | learning rate: 1.659E-05 | global batch size:    32 | lm loss: 6.542427E+00 | loss scale: 32768.0 | grad norm: 147629.605 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3349/  159576 | consumed samples:        59920 | elapsed time per iteration (ms): 14638.7 | learning rate: 1.660E-05 | global batch size:    32 | lm loss: 6.508281E+00 | loss scale: 32768.0 | grad norm: 181625.911 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3350/  159576 | consumed samples:        59952 | elapsed time per iteration (ms): 14590.8 | learning rate: 1.661E-05 | global batch size:    32 | lm loss: 6.592540E+00 | loss scale: 32768.0 | grad norm: 161023.762 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3351/  159576 | consumed samples:        59984 | elapsed time per iteration (ms): 14484.6 | learning rate: 1.662E-05 | global batch size:    32 | lm loss: 6.474733E+00 | loss scale: 32768.0 | grad norm: 125810.881 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3352/  159576 | consumed samples:        60016 | elapsed time per iteration (ms): 14782.0 | learning rate: 1.663E-05 | global batch size:    32 | lm loss: 6.515071E+00 | loss scale: 32768.0 | grad norm: 148493.240 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3353/  159576 | consumed samples:        60048 | elapsed time per iteration (ms): 14601.7 | learning rate: 1.664E-05 | global batch size:    32 | lm loss: 6.510946E+00 | loss scale: 32768.0 | grad norm: 154098.157 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3354/  159576 | consumed samples:        60080 | elapsed time per iteration (ms): 14551.7 | learning rate: 1.665E-05 | global batch size:    32 | lm loss: 6.639778E+00 | loss scale: 32768.0 | grad norm: 120125.841 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3355/  159576 | consumed samples:        60112 | elapsed time per iteration (ms): 14609.6 | learning rate: 1.666E-05 | global batch size:    32 | lm loss: 6.582976E+00 | loss scale: 32768.0 | grad norm: 125934.744 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3356/  159576 | consumed samples:        60144 | elapsed time per iteration (ms): 14773.2 | learning rate: 1.666E-05 | global batch size:    32 | lm loss: 6.492831E+00 | loss scale: 32768.0 | grad norm: 114199.841 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3357/  159576 | consumed samples:        60176 | elapsed time per iteration (ms): 14529.3 | learning rate: 1.667E-05 | global batch size:    32 | lm loss: 6.348350E+00 | loss scale: 32768.0 | grad norm: 224039.321 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3358/  159576 | consumed samples:        60208 | elapsed time per iteration (ms): 14555.6 | learning rate: 1.668E-05 | global batch size:    32 | lm loss: 6.556470E+00 | loss scale: 32768.0 | grad norm: 104992.577 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3359/  159576 | consumed samples:        60240 | elapsed time per iteration (ms): 14550.6 | learning rate: 1.669E-05 | global batch size:    32 | lm loss: 6.499870E+00 | loss scale: 32768.0 | grad norm: 135382.686 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3360/  159576 | consumed samples:        60272 | elapsed time per iteration (ms): 14838.2 | learning rate: 1.670E-05 | global batch size:    32 | lm loss: 6.482747E+00 | loss scale: 32768.0 | grad norm: 128815.280 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3361/  159576 | consumed samples:        60304 | elapsed time per iteration (ms): 14577.3 | learning rate: 1.671E-05 | global batch size:    32 | lm loss: 6.564407E+00 | loss scale: 32768.0 | grad norm: 220163.959 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3362/  159576 | consumed samples:        60336 | elapsed time per iteration (ms): 14600.9 | learning rate: 1.672E-05 | global batch size:    32 | lm loss: 6.561186E+00 | loss scale: 32768.0 | grad norm: 110111.851 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3363/  159576 | consumed samples:        60368 | elapsed time per iteration (ms): 14665.2 | learning rate: 1.673E-05 | global batch size:    32 | lm loss: 6.624823E+00 | loss scale: 32768.0 | grad norm: 119091.671 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3364/  159576 | consumed samples:        60400 | elapsed time per iteration (ms): 14799.6 | learning rate: 1.674E-05 | global batch size:    32 | lm loss: 6.572470E+00 | loss scale: 32768.0 | grad norm: 157986.014 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3365/  159576 | consumed samples:        60432 | elapsed time per iteration (ms): 14663.0 | learning rate: 1.674E-05 | global batch size:    32 | lm loss: 6.613792E+00 | loss scale: 32768.0 | grad norm: 103982.392 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3366/  159576 | consumed samples:        60464 | elapsed time per iteration (ms): 14481.2 | learning rate: 1.675E-05 | global batch size:    32 | lm loss: 6.387408E+00 | loss scale: 32768.0 | grad norm: 158220.191 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3367/  159576 | consumed samples:        60496 | elapsed time per iteration (ms): 14521.1 | learning rate: 1.676E-05 | global batch size:    32 | lm loss: 6.515392E+00 | loss scale: 32768.0 | grad norm: 123622.262 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3368/  159576 | consumed samples:        60528 | elapsed time per iteration (ms): 15053.7 | learning rate: 1.677E-05 | global batch size:    32 | lm loss: 6.568096E+00 | loss scale: 32768.0 | grad norm: 255456.390 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3369/  159576 | consumed samples:        60560 | elapsed time per iteration (ms): 14696.0 | learning rate: 1.678E-05 | global batch size:    32 | lm loss: 6.553046E+00 | loss scale: 32768.0 | grad norm: 144928.658 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3370/  159576 | consumed samples:        60592 | elapsed time per iteration (ms): 14594.8 | learning rate: 1.679E-05 | global batch size:    32 | lm loss: 6.341058E+00 | loss scale: 32768.0 | grad norm: 190527.660 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3371/  159576 | consumed samples:        60624 | elapsed time per iteration (ms): 14611.4 | learning rate: 1.680E-05 | global batch size:    32 | lm loss: 6.406933E+00 | loss scale: 32768.0 | grad norm: 164464.189 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3372/  159576 | consumed samples:        60656 | elapsed time per iteration (ms): 14997.7 | learning rate: 1.681E-05 | global batch size:    32 | lm loss: 6.472693E+00 | loss scale: 32768.0 | grad norm: 140499.535 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3373/  159576 | consumed samples:        60688 | elapsed time per iteration (ms): 14555.5 | learning rate: 1.682E-05 | global batch size:    32 | lm loss: 6.472823E+00 | loss scale: 32768.0 | grad norm: 209200.754 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3374/  159576 | consumed samples:        60720 | elapsed time per iteration (ms): 14538.5 | learning rate: 1.682E-05 | global batch size:    32 | lm loss: 6.575472E+00 | loss scale: 32768.0 | grad norm: 152311.741 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3375/  159576 | consumed samples:        60752 | elapsed time per iteration (ms): 14542.0 | learning rate: 1.683E-05 | global batch size:    32 | lm loss: 6.559402E+00 | loss scale: 32768.0 | grad norm: 139207.917 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3376/  159576 | consumed samples:        60784 | elapsed time per iteration (ms): 14908.5 | learning rate: 1.684E-05 | global batch size:    32 | lm loss: 6.450352E+00 | loss scale: 32768.0 | grad norm: 132808.916 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3377/  159576 | consumed samples:        60816 | elapsed time per iteration (ms): 14576.3 | learning rate: 1.685E-05 | global batch size:    32 | lm loss: 6.365215E+00 | loss scale: 32768.0 | grad norm: 176292.223 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3378/  159576 | consumed samples:        60848 | elapsed time per iteration (ms): 14602.1 | learning rate: 1.686E-05 | global batch size:    32 | lm loss: 6.443403E+00 | loss scale: 32768.0 | grad norm: 123052.341 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3379/  159576 | consumed samples:        60880 | elapsed time per iteration (ms): 14651.7 | learning rate: 1.687E-05 | global batch size:    32 | lm loss: 6.502498E+00 | loss scale: 32768.0 | grad norm: 100381.015 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3380/  159576 | consumed samples:        60912 | elapsed time per iteration (ms): 14854.4 | learning rate: 1.688E-05 | global batch size:    32 | lm loss: 6.296595E+00 | loss scale: 32768.0 | grad norm: 110161.712 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3381/  159576 | consumed samples:        60944 | elapsed time per iteration (ms): 14541.8 | learning rate: 1.689E-05 | global batch size:    32 | lm loss: 6.563570E+00 | loss scale: 32768.0 | grad norm: 88591.921 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3382/  159576 | consumed samples:        60976 | elapsed time per iteration (ms): 14608.6 | learning rate: 1.689E-05 | global batch size:    32 | lm loss: 6.582268E+00 | loss scale: 32768.0 | grad norm: 114214.218 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3383/  159576 | consumed samples:        61008 | elapsed time per iteration (ms): 14527.6 | learning rate: 1.690E-05 | global batch size:    32 | lm loss: 6.577205E+00 | loss scale: 32768.0 | grad norm: 122437.521 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3384/  159576 | consumed samples:        61040 | elapsed time per iteration (ms): 14914.6 | learning rate: 1.691E-05 | global batch size:    32 | lm loss: 6.428950E+00 | loss scale: 32768.0 | grad norm: 125848.491 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3385/  159576 | consumed samples:        61072 | elapsed time per iteration (ms): 14662.1 | learning rate: 1.692E-05 | global batch size:    32 | lm loss: 6.677817E+00 | loss scale: 32768.0 | grad norm: 110496.306 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3386/  159576 | consumed samples:        61104 | elapsed time per iteration (ms): 14566.3 | learning rate: 1.693E-05 | global batch size:    32 | lm loss: 6.704777E+00 | loss scale: 32768.0 | grad norm: 128540.385 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3387/  159576 | consumed samples:        61136 | elapsed time per iteration (ms): 14563.5 | learning rate: 1.694E-05 | global batch size:    32 | lm loss: 6.578674E+00 | loss scale: 32768.0 | grad norm: 143780.108 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3388/  159576 | consumed samples:        61168 | elapsed time per iteration (ms): 14890.7 | learning rate: 1.695E-05 | global batch size:    32 | lm loss: 6.503931E+00 | loss scale: 32768.0 | grad norm: 144574.375 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3389/  159576 | consumed samples:        61200 | elapsed time per iteration (ms): 14672.5 | learning rate: 1.696E-05 | global batch size:    32 | lm loss: 6.662019E+00 | loss scale: 32768.0 | grad norm: 158358.181 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3390/  159576 | consumed samples:        61232 | elapsed time per iteration (ms): 14563.8 | learning rate: 1.697E-05 | global batch size:    32 | lm loss: 6.577336E+00 | loss scale: 32768.0 | grad norm: 198110.226 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3391/  159576 | consumed samples:        61264 | elapsed time per iteration (ms): 14556.6 | learning rate: 1.697E-05 | global batch size:    32 | lm loss: 6.480102E+00 | loss scale: 32768.0 | grad norm: 131120.843 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3392/  159576 | consumed samples:        61296 | elapsed time per iteration (ms): 14679.5 | learning rate: 1.698E-05 | global batch size:    32 | lm loss: 6.610832E+00 | loss scale: 32768.0 | grad norm: 164581.156 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3393/  159576 | consumed samples:        61328 | elapsed time per iteration (ms): 14940.6 | learning rate: 1.699E-05 | global batch size:    32 | lm loss: 6.591301E+00 | loss scale: 32768.0 | grad norm: 109544.075 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3394/  159576 | consumed samples:        61360 | elapsed time per iteration (ms): 14592.5 | learning rate: 1.700E-05 | global batch size:    32 | lm loss: 6.572402E+00 | loss scale: 32768.0 | grad norm: 121937.240 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3395/  159576 | consumed samples:        61392 | elapsed time per iteration (ms): 14696.4 | learning rate: 1.701E-05 | global batch size:    32 | lm loss: 6.509333E+00 | loss scale: 32768.0 | grad norm: 125128.401 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3396/  159576 | consumed samples:        61424 | elapsed time per iteration (ms): 14508.0 | learning rate: 1.702E-05 | global batch size:    32 | lm loss: 6.481079E+00 | loss scale: 32768.0 | grad norm: 111910.670 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3397/  159576 | consumed samples:        61456 | elapsed time per iteration (ms): 14790.4 | learning rate: 1.703E-05 | global batch size:    32 | lm loss: 6.548109E+00 | loss scale: 32768.0 | grad norm: 98717.457 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3398/  159576 | consumed samples:        61488 | elapsed time per iteration (ms): 14622.0 | learning rate: 1.704E-05 | global batch size:    32 | lm loss: 6.769459E+00 | loss scale: 32768.0 | grad norm: 117754.948 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3399/  159576 | consumed samples:        61520 | elapsed time per iteration (ms): 14611.9 | learning rate: 1.705E-05 | global batch size:    32 | lm loss: 6.555518E+00 | loss scale: 32768.0 | grad norm: 122435.114 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3400/  159576 | consumed samples:        61552 | elapsed time per iteration (ms): 14673.6 | learning rate: 1.705E-05 | global batch size:    32 | lm loss: 6.464739E+00 | loss scale: 32768.0 | grad norm: 119112.548 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3401/  159576 | consumed samples:        61584 | elapsed time per iteration (ms): 14910.7 | learning rate: 1.706E-05 | global batch size:    32 | lm loss: 6.473111E+00 | loss scale: 32768.0 | grad norm: 113410.819 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3402/  159576 | consumed samples:        61616 | elapsed time per iteration (ms): 14645.2 | learning rate: 1.707E-05 | global batch size:    32 | lm loss: 6.476302E+00 | loss scale: 32768.0 | grad norm: 113730.379 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3403/  159576 | consumed samples:        61648 | elapsed time per iteration (ms): 14580.6 | learning rate: 1.708E-05 | global batch size:    32 | lm loss: 6.449226E+00 | loss scale: 32768.0 | grad norm: 82819.459 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3404/  159576 | consumed samples:        61680 | elapsed time per iteration (ms): 14600.7 | learning rate: 1.709E-05 | global batch size:    32 | lm loss: 6.560233E+00 | loss scale: 32768.0 | grad norm: 134696.405 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3405/  159576 | consumed samples:        61712 | elapsed time per iteration (ms): 14772.7 | learning rate: 1.710E-05 | global batch size:    32 | lm loss: 6.546908E+00 | loss scale: 32768.0 | grad norm: 101163.521 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3406/  159576 | consumed samples:        61744 | elapsed time per iteration (ms): 14593.3 | learning rate: 1.711E-05 | global batch size:    32 | lm loss: 6.541033E+00 | loss scale: 32768.0 | grad norm: 109699.529 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3407/  159576 | consumed samples:        61776 | elapsed time per iteration (ms): 14624.0 | learning rate: 1.712E-05 | global batch size:    32 | lm loss: 6.511957E+00 | loss scale: 32768.0 | grad norm: 91123.954 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3408/  159576 | consumed samples:        61808 | elapsed time per iteration (ms): 14724.5 | learning rate: 1.713E-05 | global batch size:    32 | lm loss: 6.628172E+00 | loss scale: 32768.0 | grad norm: 121584.252 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3409/  159576 | consumed samples:        61840 | elapsed time per iteration (ms): 15120.6 | learning rate: 1.713E-05 | global batch size:    32 | lm loss: 6.578444E+00 | loss scale: 32768.0 | grad norm: 116757.586 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3410/  159576 | consumed samples:        61872 | elapsed time per iteration (ms): 14619.5 | learning rate: 1.714E-05 | global batch size:    32 | lm loss: 6.415488E+00 | loss scale: 32768.0 | grad norm: 105815.444 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3411/  159576 | consumed samples:        61904 | elapsed time per iteration (ms): 14577.8 | learning rate: 1.715E-05 | global batch size:    32 | lm loss: 6.553544E+00 | loss scale: 32768.0 | grad norm: 104053.489 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3412/  159576 | consumed samples:        61936 | elapsed time per iteration (ms): 14587.5 | learning rate: 1.716E-05 | global batch size:    32 | lm loss: 6.435183E+00 | loss scale: 32768.0 | grad norm: 101905.898 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3413/  159576 | consumed samples:        61968 | elapsed time per iteration (ms): 14985.9 | learning rate: 1.717E-05 | global batch size:    32 | lm loss: 6.580218E+00 | loss scale: 32768.0 | grad norm: 142325.290 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3414/  159576 | consumed samples:        62000 | elapsed time per iteration (ms): 14646.8 | learning rate: 1.718E-05 | global batch size:    32 | lm loss: 6.534802E+00 | loss scale: 32768.0 | grad norm: 109771.164 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3415/  159576 | consumed samples:        62032 | elapsed time per iteration (ms): 14644.6 | learning rate: 1.719E-05 | global batch size:    32 | lm loss: 6.582119E+00 | loss scale: 32768.0 | grad norm: 192056.720 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3416/  159576 | consumed samples:        62064 | elapsed time per iteration (ms): 14616.1 | learning rate: 1.720E-05 | global batch size:    32 | lm loss: 6.496407E+00 | loss scale: 32768.0 | grad norm: 118953.837 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3417/  159576 | consumed samples:        62096 | elapsed time per iteration (ms): 15113.2 | learning rate: 1.721E-05 | global batch size:    32 | lm loss: 6.475505E+00 | loss scale: 32768.0 | grad norm: 173828.473 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3418/  159576 | consumed samples:        62128 | elapsed time per iteration (ms): 14635.6 | learning rate: 1.721E-05 | global batch size:    32 | lm loss: 6.318462E+00 | loss scale: 32768.0 | grad norm: 147925.562 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3419/  159576 | consumed samples:        62160 | elapsed time per iteration (ms): 14611.3 | learning rate: 1.722E-05 | global batch size:    32 | lm loss: 6.571759E+00 | loss scale: 32768.0 | grad norm: 112885.924 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3420/  159576 | consumed samples:        62192 | elapsed time per iteration (ms): 14573.5 | learning rate: 1.723E-05 | global batch size:    32 | lm loss: 6.461047E+00 | loss scale: 32768.0 | grad norm: 135373.791 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3421/  159576 | consumed samples:        62224 | elapsed time per iteration (ms): 14978.7 | learning rate: 1.724E-05 | global batch size:    32 | lm loss: 6.554849E+00 | loss scale: 32768.0 | grad norm: 162048.264 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3422/  159576 | consumed samples:        62256 | elapsed time per iteration (ms): 14574.6 | learning rate: 1.725E-05 | global batch size:    32 | lm loss: 6.443440E+00 | loss scale: 32768.0 | grad norm: 103393.805 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3423/  159576 | consumed samples:        62288 | elapsed time per iteration (ms): 14578.8 | learning rate: 1.726E-05 | global batch size:    32 | lm loss: 6.490220E+00 | loss scale: 32768.0 | grad norm: 217891.504 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3424/  159576 | consumed samples:        62320 | elapsed time per iteration (ms): 14669.3 | learning rate: 1.727E-05 | global batch size:    32 | lm loss: 6.475744E+00 | loss scale: 32768.0 | grad norm: 132019.548 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3425/  159576 | consumed samples:        62352 | elapsed time per iteration (ms): 15003.7 | learning rate: 1.728E-05 | global batch size:    32 | lm loss: 6.639316E+00 | loss scale: 32768.0 | grad norm: 118549.933 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3426/  159576 | consumed samples:        62384 | elapsed time per iteration (ms): 14473.5 | learning rate: 1.729E-05 | global batch size:    32 | lm loss: 6.529860E+00 | loss scale: 32768.0 | grad norm: 110134.510 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3427/  159576 | consumed samples:        62416 | elapsed time per iteration (ms): 14593.0 | learning rate: 1.729E-05 | global batch size:    32 | lm loss: 6.424025E+00 | loss scale: 32768.0 | grad norm: 96948.279 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3428/  159576 | consumed samples:        62448 | elapsed time per iteration (ms): 14574.8 | learning rate: 1.730E-05 | global batch size:    32 | lm loss: 6.603945E+00 | loss scale: 32768.0 | grad norm: 108813.419 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3429/  159576 | consumed samples:        62480 | elapsed time per iteration (ms): 14962.4 | learning rate: 1.731E-05 | global batch size:    32 | lm loss: 6.519920E+00 | loss scale: 32768.0 | grad norm: 120997.396 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3430/  159576 | consumed samples:        62512 | elapsed time per iteration (ms): 14606.5 | learning rate: 1.732E-05 | global batch size:    32 | lm loss: 6.519583E+00 | loss scale: 32768.0 | grad norm: 102226.597 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3431/  159576 | consumed samples:        62544 | elapsed time per iteration (ms): 14685.5 | learning rate: 1.733E-05 | global batch size:    32 | lm loss: 6.413152E+00 | loss scale: 32768.0 | grad norm: 146442.757 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3432/  159576 | consumed samples:        62576 | elapsed time per iteration (ms): 14642.7 | learning rate: 1.734E-05 | global batch size:    32 | lm loss: 6.416885E+00 | loss scale: 32768.0 | grad norm: 106692.633 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3433/  159576 | consumed samples:        62608 | elapsed time per iteration (ms): 14943.4 | learning rate: 1.735E-05 | global batch size:    32 | lm loss: 6.684166E+00 | loss scale: 32768.0 | grad norm: 122647.780 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3434/  159576 | consumed samples:        62640 | elapsed time per iteration (ms): 14559.8 | learning rate: 1.736E-05 | global batch size:    32 | lm loss: 6.582661E+00 | loss scale: 32768.0 | grad norm: 143037.633 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3435/  159576 | consumed samples:        62672 | elapsed time per iteration (ms): 14581.0 | learning rate: 1.737E-05 | global batch size:    32 | lm loss: 6.459047E+00 | loss scale: 32768.0 | grad norm: 139754.449 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3436/  159576 | consumed samples:        62704 | elapsed time per iteration (ms): 14594.3 | learning rate: 1.737E-05 | global batch size:    32 | lm loss: 6.455495E+00 | loss scale: 32768.0 | grad norm: 199133.358 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3437/  159576 | consumed samples:        62736 | elapsed time per iteration (ms): 14983.6 | learning rate: 1.738E-05 | global batch size:    32 | lm loss: 6.507184E+00 | loss scale: 32768.0 | grad norm: 193681.925 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3438/  159576 | consumed samples:        62768 | elapsed time per iteration (ms): 14797.2 | learning rate: 1.739E-05 | global batch size:    32 | lm loss: 6.461359E+00 | loss scale: 32768.0 | grad norm: 132732.709 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3439/  159576 | consumed samples:        62800 | elapsed time per iteration (ms): 14579.8 | learning rate: 1.740E-05 | global batch size:    32 | lm loss: 6.704415E+00 | loss scale: 32768.0 | grad norm: 113391.882 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3440/  159576 | consumed samples:        62832 | elapsed time per iteration (ms): 14621.6 | learning rate: 1.741E-05 | global batch size:    32 | lm loss: 6.473897E+00 | loss scale: 32768.0 | grad norm: 120849.572 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3441/  159576 | consumed samples:        62864 | elapsed time per iteration (ms): 14686.1 | learning rate: 1.742E-05 | global batch size:    32 | lm loss: 6.459955E+00 | loss scale: 32768.0 | grad norm: 128216.917 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3442/  159576 | consumed samples:        62896 | elapsed time per iteration (ms): 14857.9 | learning rate: 1.743E-05 | global batch size:    32 | lm loss: 6.424060E+00 | loss scale: 32768.0 | grad norm: 102672.871 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3443/  159576 | consumed samples:        62928 | elapsed time per iteration (ms): 14570.1 | learning rate: 1.744E-05 | global batch size:    32 | lm loss: 6.534360E+00 | loss scale: 32768.0 | grad norm: 184877.887 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3444/  159576 | consumed samples:        62960 | elapsed time per iteration (ms): 14620.2 | learning rate: 1.745E-05 | global batch size:    32 | lm loss: 6.629717E+00 | loss scale: 32768.0 | grad norm: 138408.073 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3445/  159576 | consumed samples:        62992 | elapsed time per iteration (ms): 14619.1 | learning rate: 1.745E-05 | global batch size:    32 | lm loss: 6.494986E+00 | loss scale: 32768.0 | grad norm: 131634.897 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3446/  159576 | consumed samples:        63024 | elapsed time per iteration (ms): 14739.8 | learning rate: 1.746E-05 | global batch size:    32 | lm loss: 6.529834E+00 | loss scale: 32768.0 | grad norm: 190204.428 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3447/  159576 | consumed samples:        63056 | elapsed time per iteration (ms): 14575.9 | learning rate: 1.747E-05 | global batch size:    32 | lm loss: 6.519164E+00 | loss scale: 32768.0 | grad norm: 190893.633 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3448/  159576 | consumed samples:        63088 | elapsed time per iteration (ms): 14611.0 | learning rate: 1.748E-05 | global batch size:    32 | lm loss: 6.431557E+00 | loss scale: 32768.0 | grad norm: 127326.623 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3449/  159576 | consumed samples:        63120 | elapsed time per iteration (ms): 14615.1 | learning rate: 1.749E-05 | global batch size:    32 | lm loss: 6.213955E+00 | loss scale: 32768.0 | grad norm: 149485.955 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3450/  159576 | consumed samples:        63152 | elapsed time per iteration (ms): 14697.2 | learning rate: 1.750E-05 | global batch size:    32 | lm loss: 6.669972E+00 | loss scale: 32768.0 | grad norm: 121418.512 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3451/  159576 | consumed samples:        63184 | elapsed time per iteration (ms): 14506.2 | learning rate: 1.751E-05 | global batch size:    32 | lm loss: 6.538607E+00 | loss scale: 32768.0 | grad norm: 160228.418 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3452/  159576 | consumed samples:        63216 | elapsed time per iteration (ms): 14518.4 | learning rate: 1.752E-05 | global batch size:    32 | lm loss: 6.466623E+00 | loss scale: 32768.0 | grad norm: 132558.400 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3453/  159576 | consumed samples:        63248 | elapsed time per iteration (ms): 14654.4 | learning rate: 1.753E-05 | global batch size:    32 | lm loss: 6.575057E+00 | loss scale: 32768.0 | grad norm: 126715.953 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3454/  159576 | consumed samples:        63280 | elapsed time per iteration (ms): 14975.6 | learning rate: 1.753E-05 | global batch size:    32 | lm loss: 6.469002E+00 | loss scale: 32768.0 | grad norm: 134315.470 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3455/  159576 | consumed samples:        63312 | elapsed time per iteration (ms): 14595.3 | learning rate: 1.754E-05 | global batch size:    32 | lm loss: 6.471159E+00 | loss scale: 32768.0 | grad norm: 132183.538 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3456/  159576 | consumed samples:        63344 | elapsed time per iteration (ms): 14624.6 | learning rate: 1.755E-05 | global batch size:    32 | lm loss: 6.390759E+00 | loss scale: 32768.0 | grad norm: 168993.753 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3457/  159576 | consumed samples:        63376 | elapsed time per iteration (ms): 14611.9 | learning rate: 1.756E-05 | global batch size:    32 | lm loss: 6.545074E+00 | loss scale: 32768.0 | grad norm: 116907.132 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3458/  159576 | consumed samples:        63408 | elapsed time per iteration (ms): 14991.7 | learning rate: 1.757E-05 | global batch size:    32 | lm loss: 6.541002E+00 | loss scale: 32768.0 | grad norm: 144421.845 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3459/  159576 | consumed samples:        63440 | elapsed time per iteration (ms): 14690.5 | learning rate: 1.758E-05 | global batch size:    32 | lm loss: 6.549660E+00 | loss scale: 32768.0 | grad norm: 177618.434 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3460/  159576 | consumed samples:        63472 | elapsed time per iteration (ms): 14572.5 | learning rate: 1.759E-05 | global batch size:    32 | lm loss: 6.509130E+00 | loss scale: 32768.0 | grad norm: 102216.190 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3461/  159576 | consumed samples:        63504 | elapsed time per iteration (ms): 14630.9 | learning rate: 1.760E-05 | global batch size:    32 | lm loss: 6.474805E+00 | loss scale: 32768.0 | grad norm: 198903.879 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3462/  159576 | consumed samples:        63536 | elapsed time per iteration (ms): 14903.4 | learning rate: 1.761E-05 | global batch size:    32 | lm loss: 6.343786E+00 | loss scale: 32768.0 | grad norm: 142714.038 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3463/  159576 | consumed samples:        63568 | elapsed time per iteration (ms): 14638.9 | learning rate: 1.761E-05 | global batch size:    32 | lm loss: 6.644784E+00 | loss scale: 32768.0 | grad norm: 158591.280 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3464/  159576 | consumed samples:        63600 | elapsed time per iteration (ms): 14613.0 | learning rate: 1.762E-05 | global batch size:    32 | lm loss: 6.625895E+00 | loss scale: 32768.0 | grad norm: 123320.343 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3465/  159576 | consumed samples:        63632 | elapsed time per iteration (ms): 14585.1 | learning rate: 1.763E-05 | global batch size:    32 | lm loss: 6.575481E+00 | loss scale: 32768.0 | grad norm: 175492.554 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3466/  159576 | consumed samples:        63664 | elapsed time per iteration (ms): 15007.9 | learning rate: 1.764E-05 | global batch size:    32 | lm loss: 6.510527E+00 | loss scale: 32768.0 | grad norm: 141462.343 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3467/  159576 | consumed samples:        63696 | elapsed time per iteration (ms): 14658.4 | learning rate: 1.765E-05 | global batch size:    32 | lm loss: 6.281921E+00 | loss scale: 32768.0 | grad norm: 133404.006 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3468/  159576 | consumed samples:        63728 | elapsed time per iteration (ms): 14580.1 | learning rate: 1.766E-05 | global batch size:    32 | lm loss: 6.438425E+00 | loss scale: 32768.0 | grad norm: 155340.501 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3469/  159576 | consumed samples:        63760 | elapsed time per iteration (ms): 14575.6 | learning rate: 1.767E-05 | global batch size:    32 | lm loss: 6.527649E+00 | loss scale: 32768.0 | grad norm: 99587.133 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3470/  159576 | consumed samples:        63792 | elapsed time per iteration (ms): 14895.6 | learning rate: 1.768E-05 | global batch size:    32 | lm loss: 6.196751E+00 | loss scale: 32768.0 | grad norm: 208702.232 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3471/  159576 | consumed samples:        63824 | elapsed time per iteration (ms): 14601.7 | learning rate: 1.768E-05 | global batch size:    32 | lm loss: 6.487125E+00 | loss scale: 32768.0 | grad norm: 168900.933 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3472/  159576 | consumed samples:        63856 | elapsed time per iteration (ms): 14566.0 | learning rate: 1.769E-05 | global batch size:    32 | lm loss: 6.509688E+00 | loss scale: 32768.0 | grad norm: 154921.949 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3473/  159576 | consumed samples:        63888 | elapsed time per iteration (ms): 14575.1 | learning rate: 1.770E-05 | global batch size:    32 | lm loss: 6.622843E+00 | loss scale: 32768.0 | grad norm: 140472.596 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3474/  159576 | consumed samples:        63920 | elapsed time per iteration (ms): 14877.5 | learning rate: 1.771E-05 | global batch size:    32 | lm loss: 6.475362E+00 | loss scale: 32768.0 | grad norm: 119718.275 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3475/  159576 | consumed samples:        63952 | elapsed time per iteration (ms): 14552.0 | learning rate: 1.772E-05 | global batch size:    32 | lm loss: 6.465285E+00 | loss scale: 32768.0 | grad norm: 172671.121 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3476/  159576 | consumed samples:        63984 | elapsed time per iteration (ms): 14582.7 | learning rate: 1.773E-05 | global batch size:    32 | lm loss: 6.389154E+00 | loss scale: 32768.0 | grad norm: 113417.369 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3477/  159576 | consumed samples:        64016 | elapsed time per iteration (ms): 14606.6 | learning rate: 1.774E-05 | global batch size:    32 | lm loss: 6.582153E+00 | loss scale: 32768.0 | grad norm: 139244.123 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3478/  159576 | consumed samples:        64048 | elapsed time per iteration (ms): 14915.2 | learning rate: 1.775E-05 | global batch size:    32 | lm loss: 6.490180E+00 | loss scale: 32768.0 | grad norm: 94281.862 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3479/  159576 | consumed samples:        64080 | elapsed time per iteration (ms): 14555.1 | learning rate: 1.776E-05 | global batch size:    32 | lm loss: 6.683810E+00 | loss scale: 32768.0 | grad norm: 149137.080 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3480/  159576 | consumed samples:        64112 | elapsed time per iteration (ms): 14553.1 | learning rate: 1.776E-05 | global batch size:    32 | lm loss: 6.534214E+00 | loss scale: 32768.0 | grad norm: 129169.136 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3481/  159576 | consumed samples:        64144 | elapsed time per iteration (ms): 14603.3 | learning rate: 1.777E-05 | global batch size:    32 | lm loss: 6.581446E+00 | loss scale: 32768.0 | grad norm: 115991.644 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3482/  159576 | consumed samples:        64176 | elapsed time per iteration (ms): 14916.9 | learning rate: 1.778E-05 | global batch size:    32 | lm loss: 6.567008E+00 | loss scale: 32768.0 | grad norm: 184960.532 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3483/  159576 | consumed samples:        64208 | elapsed time per iteration (ms): 14481.2 | learning rate: 1.779E-05 | global batch size:    32 | lm loss: 6.662760E+00 | loss scale: 32768.0 | grad norm: 134077.108 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3484/  159576 | consumed samples:        64240 | elapsed time per iteration (ms): 14567.5 | learning rate: 1.780E-05 | global batch size:    32 | lm loss: 6.589795E+00 | loss scale: 32768.0 | grad norm: 126611.070 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3485/  159576 | consumed samples:        64272 | elapsed time per iteration (ms): 14495.3 | learning rate: 1.781E-05 | global batch size:    32 | lm loss: 6.497936E+00 | loss scale: 32768.0 | grad norm: 122115.644 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3486/  159576 | consumed samples:        64304 | elapsed time per iteration (ms): 14568.8 | learning rate: 1.782E-05 | global batch size:    32 | lm loss: 6.558665E+00 | loss scale: 32768.0 | grad norm: 126373.837 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3487/  159576 | consumed samples:        64336 | elapsed time per iteration (ms): 14913.4 | learning rate: 1.783E-05 | global batch size:    32 | lm loss: 6.431637E+00 | loss scale: 32768.0 | grad norm: 161636.464 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3488/  159576 | consumed samples:        64368 | elapsed time per iteration (ms): 14528.7 | learning rate: 1.784E-05 | global batch size:    32 | lm loss: 6.356628E+00 | loss scale: 32768.0 | grad norm: 114700.134 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3489/  159576 | consumed samples:        64400 | elapsed time per iteration (ms): 14522.5 | learning rate: 1.784E-05 | global batch size:    32 | lm loss: 6.470509E+00 | loss scale: 32768.0 | grad norm: 157358.888 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3490/  159576 | consumed samples:        64432 | elapsed time per iteration (ms): 14512.2 | learning rate: 1.785E-05 | global batch size:    32 | lm loss: 6.580731E+00 | loss scale: 32768.0 | grad norm: 124839.092 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3491/  159576 | consumed samples:        64464 | elapsed time per iteration (ms): 14760.8 | learning rate: 1.786E-05 | global batch size:    32 | lm loss: 6.545910E+00 | loss scale: 32768.0 | grad norm: 225734.887 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3492/  159576 | consumed samples:        64496 | elapsed time per iteration (ms): 14465.1 | learning rate: 1.787E-05 | global batch size:    32 | lm loss: 6.462240E+00 | loss scale: 32768.0 | grad norm: 157153.606 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3493/  159576 | consumed samples:        64528 | elapsed time per iteration (ms): 14555.7 | learning rate: 1.788E-05 | global batch size:    32 | lm loss: 6.526244E+00 | loss scale: 32768.0 | grad norm: 134834.105 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3494/  159576 | consumed samples:        64560 | elapsed time per iteration (ms): 14523.5 | learning rate: 1.789E-05 | global batch size:    32 | lm loss: 6.464767E+00 | loss scale: 32768.0 | grad norm: 111080.299 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3495/  159576 | consumed samples:        64592 | elapsed time per iteration (ms): 14680.5 | learning rate: 1.790E-05 | global batch size:    32 | lm loss: 6.498696E+00 | loss scale: 32768.0 | grad norm: 149926.493 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3496/  159576 | consumed samples:        64624 | elapsed time per iteration (ms): 14537.6 | learning rate: 1.791E-05 | global batch size:    32 | lm loss: 6.801207E+00 | loss scale: 32768.0 | grad norm: 169978.323 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3497/  159576 | consumed samples:        64656 | elapsed time per iteration (ms): 14576.8 | learning rate: 1.792E-05 | global batch size:    32 | lm loss: 6.458578E+00 | loss scale: 32768.0 | grad norm: 128624.834 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3498/  159576 | consumed samples:        64688 | elapsed time per iteration (ms): 14451.0 | learning rate: 1.792E-05 | global batch size:    32 | lm loss: 6.562904E+00 | loss scale: 32768.0 | grad norm: 201818.910 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3499/  159576 | consumed samples:        64720 | elapsed time per iteration (ms): 14843.4 | learning rate: 1.793E-05 | global batch size:    32 | lm loss: 6.620703E+00 | loss scale: 32768.0 | grad norm: 136369.889 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3500/  159576 | consumed samples:        64752 | elapsed time per iteration (ms): 14591.5 | learning rate: 1.794E-05 | global batch size:    32 | lm loss: 6.545550E+00 | loss scale: 32768.0 | grad norm: 169642.276 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3501/  159576 | consumed samples:        64784 | elapsed time per iteration (ms): 14557.9 | learning rate: 1.795E-05 | global batch size:    32 | lm loss: 6.401666E+00 | loss scale: 32768.0 | grad norm: 152333.231 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3502/  159576 | consumed samples:        64816 | elapsed time per iteration (ms): 14554.3 | learning rate: 1.796E-05 | global batch size:    32 | lm loss: 6.776519E+00 | loss scale: 32768.0 | grad norm: 234394.263 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3503/  159576 | consumed samples:        64848 | elapsed time per iteration (ms): 14868.0 | learning rate: 1.797E-05 | global batch size:    32 | lm loss: 6.465873E+00 | loss scale: 32768.0 | grad norm: 117665.279 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3504/  159576 | consumed samples:        64880 | elapsed time per iteration (ms): 14552.4 | learning rate: 1.798E-05 | global batch size:    32 | lm loss: 6.534934E+00 | loss scale: 32768.0 | grad norm: 205418.453 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3505/  159576 | consumed samples:        64912 | elapsed time per iteration (ms): 14532.4 | learning rate: 1.799E-05 | global batch size:    32 | lm loss: 6.777419E+00 | loss scale: 32768.0 | grad norm: 156642.326 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3506/  159576 | consumed samples:        64944 | elapsed time per iteration (ms): 14549.9 | learning rate: 1.800E-05 | global batch size:    32 | lm loss: 6.528007E+00 | loss scale: 32768.0 | grad norm: 168324.988 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3507/  159576 | consumed samples:        64976 | elapsed time per iteration (ms): 14947.6 | learning rate: 1.800E-05 | global batch size:    32 | lm loss: 6.669527E+00 | loss scale: 32768.0 | grad norm: 116164.306 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3508/  159576 | consumed samples:        65008 | elapsed time per iteration (ms): 14485.1 | learning rate: 1.801E-05 | global batch size:    32 | lm loss: 6.649974E+00 | loss scale: 32768.0 | grad norm: 195968.521 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3509/  159576 | consumed samples:        65040 | elapsed time per iteration (ms): 14549.4 | learning rate: 1.802E-05 | global batch size:    32 | lm loss: 6.636446E+00 | loss scale: 32768.0 | grad norm: 135969.732 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3510/  159576 | consumed samples:        65072 | elapsed time per iteration (ms): 14546.9 | learning rate: 1.803E-05 | global batch size:    32 | lm loss: 6.529005E+00 | loss scale: 32768.0 | grad norm: 225903.317 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3511/  159576 | consumed samples:        65104 | elapsed time per iteration (ms): 14847.8 | learning rate: 1.804E-05 | global batch size:    32 | lm loss: 6.629415E+00 | loss scale: 32768.0 | grad norm: 130652.559 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3512/  159576 | consumed samples:        65136 | elapsed time per iteration (ms): 14520.0 | learning rate: 1.805E-05 | global batch size:    32 | lm loss: 6.599288E+00 | loss scale: 32768.0 | grad norm: 149863.059 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3513/  159576 | consumed samples:        65168 | elapsed time per iteration (ms): 14651.1 | learning rate: 1.806E-05 | global batch size:    32 | lm loss: 6.592654E+00 | loss scale: 32768.0 | grad norm: 166996.968 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3514/  159576 | consumed samples:        65200 | elapsed time per iteration (ms): 14479.3 | learning rate: 1.807E-05 | global batch size:    32 | lm loss: 6.540200E+00 | loss scale: 32768.0 | grad norm: 115498.690 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3515/  159576 | consumed samples:        65232 | elapsed time per iteration (ms): 14930.0 | learning rate: 1.808E-05 | global batch size:    32 | lm loss: 6.488201E+00 | loss scale: 32768.0 | grad norm: 217689.196 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3516/  159576 | consumed samples:        65264 | elapsed time per iteration (ms): 14459.8 | learning rate: 1.808E-05 | global batch size:    32 | lm loss: 6.478746E+00 | loss scale: 32768.0 | grad norm: 131460.444 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3517/  159576 | consumed samples:        65296 | elapsed time per iteration (ms): 14524.9 | learning rate: 1.809E-05 | global batch size:    32 | lm loss: 6.658568E+00 | loss scale: 32768.0 | grad norm: 186540.119 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3518/  159576 | consumed samples:        65328 | elapsed time per iteration (ms): 14525.2 | learning rate: 1.810E-05 | global batch size:    32 | lm loss: 6.641760E+00 | loss scale: 32768.0 | grad norm: 215453.929 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3519/  159576 | consumed samples:        65360 | elapsed time per iteration (ms): 14903.9 | learning rate: 1.811E-05 | global batch size:    32 | lm loss: 6.578794E+00 | loss scale: 32768.0 | grad norm: 129785.760 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3520/  159576 | consumed samples:        65392 | elapsed time per iteration (ms): 14710.5 | learning rate: 1.812E-05 | global batch size:    32 | lm loss: 6.623507E+00 | loss scale: 32768.0 | grad norm: 120935.963 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3521/  159576 | consumed samples:        65424 | elapsed time per iteration (ms): 14520.7 | learning rate: 1.813E-05 | global batch size:    32 | lm loss: 6.597843E+00 | loss scale: 32768.0 | grad norm: 116244.009 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3522/  159576 | consumed samples:        65456 | elapsed time per iteration (ms): 14597.0 | learning rate: 1.814E-05 | global batch size:    32 | lm loss: 6.504926E+00 | loss scale: 32768.0 | grad norm: 134767.376 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3523/  159576 | consumed samples:        65488 | elapsed time per iteration (ms): 14942.9 | learning rate: 1.815E-05 | global batch size:    32 | lm loss: 6.435289E+00 | loss scale: 32768.0 | grad norm: 86682.164 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3524/  159576 | consumed samples:        65520 | elapsed time per iteration (ms): 14654.2 | learning rate: 1.816E-05 | global batch size:    32 | lm loss: 6.594196E+00 | loss scale: 32768.0 | grad norm: 134027.315 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3525/  159576 | consumed samples:        65552 | elapsed time per iteration (ms): 14562.7 | learning rate: 1.816E-05 | global batch size:    32 | lm loss: 6.679243E+00 | loss scale: 32768.0 | grad norm: 125221.442 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3526/  159576 | consumed samples:        65584 | elapsed time per iteration (ms): 14630.7 | learning rate: 1.817E-05 | global batch size:    32 | lm loss: 6.456674E+00 | loss scale: 32768.0 | grad norm: 86112.712 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3527/  159576 | consumed samples:        65616 | elapsed time per iteration (ms): 14493.8 | learning rate: 1.818E-05 | global batch size:    32 | lm loss: 6.600234E+00 | loss scale: 32768.0 | grad norm: 300729.659 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3528/  159576 | consumed samples:        65648 | elapsed time per iteration (ms): 14813.0 | learning rate: 1.819E-05 | global batch size:    32 | lm loss: 6.399897E+00 | loss scale: 32768.0 | grad norm: 153878.237 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3529/  159576 | consumed samples:        65680 | elapsed time per iteration (ms): 14593.6 | learning rate: 1.820E-05 | global batch size:    32 | lm loss: 6.540657E+00 | loss scale: 32768.0 | grad norm: 150860.243 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3530/  159576 | consumed samples:        65712 | elapsed time per iteration (ms): 14559.8 | learning rate: 1.821E-05 | global batch size:    32 | lm loss: 6.503862E+00 | loss scale: 32768.0 | grad norm: 149193.561 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3531/  159576 | consumed samples:        65744 | elapsed time per iteration (ms): 14581.4 | learning rate: 1.822E-05 | global batch size:    32 | lm loss: 6.692787E+00 | loss scale: 32768.0 | grad norm: 207812.798 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3532/  159576 | consumed samples:        65776 | elapsed time per iteration (ms): 14715.5 | learning rate: 1.823E-05 | global batch size:    32 | lm loss: 6.484317E+00 | loss scale: 32768.0 | grad norm: 161092.514 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3533/  159576 | consumed samples:        65808 | elapsed time per iteration (ms): 14610.9 | learning rate: 1.824E-05 | global batch size:    32 | lm loss: 6.475138E+00 | loss scale: 32768.0 | grad norm: 155421.456 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3534/  159576 | consumed samples:        65840 | elapsed time per iteration (ms): 14445.3 | learning rate: 1.824E-05 | global batch size:    32 | lm loss: 6.511703E+00 | loss scale: 32768.0 | grad norm: 114681.720 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3535/  159576 | consumed samples:        65872 | elapsed time per iteration (ms): 14477.9 | learning rate: 1.825E-05 | global batch size:    32 | lm loss: 6.509159E+00 | loss scale: 32768.0 | grad norm: 183050.824 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3536/  159576 | consumed samples:        65904 | elapsed time per iteration (ms): 14816.2 | learning rate: 1.826E-05 | global batch size:    32 | lm loss: 6.497670E+00 | loss scale: 32768.0 | grad norm: 96091.191 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3537/  159576 | consumed samples:        65936 | elapsed time per iteration (ms): 14439.5 | learning rate: 1.827E-05 | global batch size:    32 | lm loss: 6.505747E+00 | loss scale: 32768.0 | grad norm: 140156.886 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3538/  159576 | consumed samples:        65968 | elapsed time per iteration (ms): 14594.1 | learning rate: 1.828E-05 | global batch size:    32 | lm loss: 6.516546E+00 | loss scale: 32768.0 | grad norm: 97276.324 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3539/  159576 | consumed samples:        66000 | elapsed time per iteration (ms): 14531.0 | learning rate: 1.829E-05 | global batch size:    32 | lm loss: 6.589782E+00 | loss scale: 32768.0 | grad norm: 283362.362 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3540/  159576 | consumed samples:        66032 | elapsed time per iteration (ms): 14766.1 | learning rate: 1.830E-05 | global batch size:    32 | lm loss: 6.457118E+00 | loss scale: 32768.0 | grad norm: 119093.566 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3541/  159576 | consumed samples:        66064 | elapsed time per iteration (ms): 14538.8 | learning rate: 1.831E-05 | global batch size:    32 | lm loss: 6.543458E+00 | loss scale: 32768.0 | grad norm: 143270.575 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3542/  159576 | consumed samples:        66096 | elapsed time per iteration (ms): 14503.8 | learning rate: 1.832E-05 | global batch size:    32 | lm loss: 6.549830E+00 | loss scale: 32768.0 | grad norm: 146934.297 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3543/  159576 | consumed samples:        66128 | elapsed time per iteration (ms): 14525.1 | learning rate: 1.832E-05 | global batch size:    32 | lm loss: 6.523373E+00 | loss scale: 32768.0 | grad norm: 246079.782 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3544/  159576 | consumed samples:        66160 | elapsed time per iteration (ms): 14836.5 | learning rate: 1.833E-05 | global batch size:    32 | lm loss: 6.484323E+00 | loss scale: 32768.0 | grad norm: 150473.482 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3545/  159576 | consumed samples:        66192 | elapsed time per iteration (ms): 14612.1 | learning rate: 1.834E-05 | global batch size:    32 | lm loss: 6.596731E+00 | loss scale: 32768.0 | grad norm: 157995.993 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3546/  159576 | consumed samples:        66224 | elapsed time per iteration (ms): 14518.2 | learning rate: 1.835E-05 | global batch size:    32 | lm loss: 6.564546E+00 | loss scale: 32768.0 | grad norm: 164874.723 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3547/  159576 | consumed samples:        66256 | elapsed time per iteration (ms): 14501.0 | learning rate: 1.836E-05 | global batch size:    32 | lm loss: 6.427078E+00 | loss scale: 32768.0 | grad norm: 175876.651 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3548/  159576 | consumed samples:        66288 | elapsed time per iteration (ms): 14899.9 | learning rate: 1.837E-05 | global batch size:    32 | lm loss: 6.488606E+00 | loss scale: 32768.0 | grad norm: 198886.829 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3549/  159576 | consumed samples:        66320 | elapsed time per iteration (ms): 14520.6 | learning rate: 1.838E-05 | global batch size:    32 | lm loss: 6.462682E+00 | loss scale: 32768.0 | grad norm: 127675.702 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3550/  159576 | consumed samples:        66352 | elapsed time per iteration (ms): 14447.8 | learning rate: 1.839E-05 | global batch size:    32 | lm loss: 6.652044E+00 | loss scale: 32768.0 | grad norm: 140944.667 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3551/  159576 | consumed samples:        66384 | elapsed time per iteration (ms): 14467.2 | learning rate: 1.839E-05 | global batch size:    32 | lm loss: 6.520955E+00 | loss scale: 32768.0 | grad norm: 86094.102 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3552/  159576 | consumed samples:        66416 | elapsed time per iteration (ms): 14808.2 | learning rate: 1.840E-05 | global batch size:    32 | lm loss: 6.429432E+00 | loss scale: 32768.0 | grad norm: 116647.112 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3553/  159576 | consumed samples:        66448 | elapsed time per iteration (ms): 14503.5 | learning rate: 1.841E-05 | global batch size:    32 | lm loss: 6.463936E+00 | loss scale: 32768.0 | grad norm: 118564.730 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3554/  159576 | consumed samples:        66480 | elapsed time per iteration (ms): 14502.1 | learning rate: 1.842E-05 | global batch size:    32 | lm loss: 6.458220E+00 | loss scale: 32768.0 | grad norm: 112013.908 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3555/  159576 | consumed samples:        66512 | elapsed time per iteration (ms): 14486.2 | learning rate: 1.843E-05 | global batch size:    32 | lm loss: 6.492205E+00 | loss scale: 32768.0 | grad norm: 95075.794 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3556/  159576 | consumed samples:        66544 | elapsed time per iteration (ms): 14873.1 | learning rate: 1.844E-05 | global batch size:    32 | lm loss: 6.582590E+00 | loss scale: 32768.0 | grad norm: 160024.973 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3557/  159576 | consumed samples:        66576 | elapsed time per iteration (ms): 14487.7 | learning rate: 1.845E-05 | global batch size:    32 | lm loss: 6.504139E+00 | loss scale: 32768.0 | grad norm: 102536.359 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3558/  159576 | consumed samples:        66608 | elapsed time per iteration (ms): 14571.2 | learning rate: 1.846E-05 | global batch size:    32 | lm loss: 6.514203E+00 | loss scale: 32768.0 | grad norm: 221229.679 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3559/  159576 | consumed samples:        66640 | elapsed time per iteration (ms): 14451.0 | learning rate: 1.847E-05 | global batch size:    32 | lm loss: 6.560319E+00 | loss scale: 32768.0 | grad norm: 131012.754 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3560/  159576 | consumed samples:        66672 | elapsed time per iteration (ms): 14938.1 | learning rate: 1.847E-05 | global batch size:    32 | lm loss: 6.372297E+00 | loss scale: 32768.0 | grad norm: 139056.836 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3561/  159576 | consumed samples:        66704 | elapsed time per iteration (ms): 14523.1 | learning rate: 1.848E-05 | global batch size:    32 | lm loss: 6.416655E+00 | loss scale: 32768.0 | grad norm: 147497.179 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3562/  159576 | consumed samples:        66736 | elapsed time per iteration (ms): 14487.9 | learning rate: 1.849E-05 | global batch size:    32 | lm loss: 6.474949E+00 | loss scale: 32768.0 | grad norm: 174437.813 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3563/  159576 | consumed samples:        66768 | elapsed time per iteration (ms): 14468.9 | learning rate: 1.850E-05 | global batch size:    32 | lm loss: 6.623423E+00 | loss scale: 32768.0 | grad norm: 122791.597 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3564/  159576 | consumed samples:        66800 | elapsed time per iteration (ms): 14508.1 | learning rate: 1.851E-05 | global batch size:    32 | lm loss: 6.516719E+00 | loss scale: 32768.0 | grad norm: 125896.178 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3565/  159576 | consumed samples:        66832 | elapsed time per iteration (ms): 14821.3 | learning rate: 1.852E-05 | global batch size:    32 | lm loss: 6.567136E+00 | loss scale: 32768.0 | grad norm: 156146.827 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3566/  159576 | consumed samples:        66864 | elapsed time per iteration (ms): 14550.7 | learning rate: 1.853E-05 | global batch size:    32 | lm loss: 6.464426E+00 | loss scale: 32768.0 | grad norm: 112089.852 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3567/  159576 | consumed samples:        66896 | elapsed time per iteration (ms): 14483.3 | learning rate: 1.854E-05 | global batch size:    32 | lm loss: 6.330031E+00 | loss scale: 32768.0 | grad norm: 100672.150 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3568/  159576 | consumed samples:        66928 | elapsed time per iteration (ms): 14573.3 | learning rate: 1.855E-05 | global batch size:    32 | lm loss: 6.472744E+00 | loss scale: 32768.0 | grad norm: 206164.387 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3569/  159576 | consumed samples:        66960 | elapsed time per iteration (ms): 14778.2 | learning rate: 1.855E-05 | global batch size:    32 | lm loss: 6.502261E+00 | loss scale: 32768.0 | grad norm: 117741.940 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3570/  159576 | consumed samples:        66992 | elapsed time per iteration (ms): 14563.8 | learning rate: 1.856E-05 | global batch size:    32 | lm loss: 6.480472E+00 | loss scale: 32768.0 | grad norm: 180667.970 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3571/  159576 | consumed samples:        67024 | elapsed time per iteration (ms): 14517.4 | learning rate: 1.857E-05 | global batch size:    32 | lm loss: 6.653479E+00 | loss scale: 32768.0 | grad norm: 121625.335 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3572/  159576 | consumed samples:        67056 | elapsed time per iteration (ms): 14532.0 | learning rate: 1.858E-05 | global batch size:    32 | lm loss: 6.478413E+00 | loss scale: 32768.0 | grad norm: 135823.282 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3573/  159576 | consumed samples:        67088 | elapsed time per iteration (ms): 14807.4 | learning rate: 1.859E-05 | global batch size:    32 | lm loss: 6.589501E+00 | loss scale: 32768.0 | grad norm: 147763.903 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3574/  159576 | consumed samples:        67120 | elapsed time per iteration (ms): 14483.4 | learning rate: 1.860E-05 | global batch size:    32 | lm loss: 6.503617E+00 | loss scale: 32768.0 | grad norm: 85865.567 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3575/  159576 | consumed samples:        67152 | elapsed time per iteration (ms): 14505.6 | learning rate: 1.861E-05 | global batch size:    32 | lm loss: 6.573061E+00 | loss scale: 32768.0 | grad norm: 180050.879 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3576/  159576 | consumed samples:        67184 | elapsed time per iteration (ms): 14550.9 | learning rate: 1.862E-05 | global batch size:    32 | lm loss: 6.480776E+00 | loss scale: 32768.0 | grad norm: 122066.327 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3577/  159576 | consumed samples:        67216 | elapsed time per iteration (ms): 14868.6 | learning rate: 1.863E-05 | global batch size:    32 | lm loss: 6.625753E+00 | loss scale: 32768.0 | grad norm: 166062.326 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3578/  159576 | consumed samples:        67248 | elapsed time per iteration (ms): 14594.8 | learning rate: 1.863E-05 | global batch size:    32 | lm loss: 6.470201E+00 | loss scale: 32768.0 | grad norm: 158898.525 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-24 16:06:53] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1165978_[1-10%1] on 'gpu_p13' partition)
[2021-09-24 16:06:53] PULSE: tr8-104B is running for 10:14:42 since 2021-09-24T05:52:11 (1162855_1 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8])
 iteration     3579/  159576 | consumed samples:        67280 | elapsed time per iteration (ms): 14505.5 | learning rate: 1.864E-05 | global batch size:    32 | lm loss: 6.669123E+00 | loss scale: 32768.0 | grad norm: 114371.472 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3580/  159576 | consumed samples:        67312 | elapsed time per iteration (ms): 14435.4 | learning rate: 1.865E-05 | global batch size:    32 | lm loss: 6.504656E+00 | loss scale: 32768.0 | grad norm: 143322.183 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3581/  159576 | consumed samples:        67344 | elapsed time per iteration (ms): 14983.8 | learning rate: 1.866E-05 | global batch size:    32 | lm loss: 6.634960E+00 | loss scale: 32768.0 | grad norm: 124051.571 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3582/  159576 | consumed samples:        67376 | elapsed time per iteration (ms): 14518.7 | learning rate: 1.867E-05 | global batch size:    32 | lm loss: 6.488723E+00 | loss scale: 32768.0 | grad norm: 108661.260 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3583/  159576 | consumed samples:        67408 | elapsed time per iteration (ms): 14495.4 | learning rate: 1.868E-05 | global batch size:    32 | lm loss: 6.397575E+00 | loss scale: 32768.0 | grad norm: 156428.484 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3584/  159576 | consumed samples:        67440 | elapsed time per iteration (ms): 14500.4 | learning rate: 1.869E-05 | global batch size:    32 | lm loss: 6.505555E+00 | loss scale: 32768.0 | grad norm: 158735.801 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3585/  159576 | consumed samples:        67472 | elapsed time per iteration (ms): 14850.8 | learning rate: 1.870E-05 | global batch size:    32 | lm loss: 6.384704E+00 | loss scale: 32768.0 | grad norm: 121455.406 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3586/  159576 | consumed samples:        67504 | elapsed time per iteration (ms): 14516.1 | learning rate: 1.871E-05 | global batch size:    32 | lm loss: 6.391223E+00 | loss scale: 32768.0 | grad norm: 200272.961 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3587/  159576 | consumed samples:        67536 | elapsed time per iteration (ms): 14478.9 | learning rate: 1.871E-05 | global batch size:    32 | lm loss: 6.602296E+00 | loss scale: 32768.0 | grad norm: 156857.138 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3588/  159576 | consumed samples:        67568 | elapsed time per iteration (ms): 14457.3 | learning rate: 1.872E-05 | global batch size:    32 | lm loss: 6.356599E+00 | loss scale: 32768.0 | grad norm: 132240.106 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3589/  159576 | consumed samples:        67600 | elapsed time per iteration (ms): 14840.9 | learning rate: 1.873E-05 | global batch size:    32 | lm loss: 6.517581E+00 | loss scale: 32768.0 | grad norm: 101976.390 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3590/  159576 | consumed samples:        67632 | elapsed time per iteration (ms): 14478.5 | learning rate: 1.874E-05 | global batch size:    32 | lm loss: 6.495076E+00 | loss scale: 32768.0 | grad norm: 145637.558 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3591/  159576 | consumed samples:        67664 | elapsed time per iteration (ms): 14537.3 | learning rate: 1.875E-05 | global batch size:    32 | lm loss: 6.486649E+00 | loss scale: 32768.0 | grad norm: 110128.136 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3592/  159576 | consumed samples:        67696 | elapsed time per iteration (ms): 14585.1 | learning rate: 1.876E-05 | global batch size:    32 | lm loss: 6.484485E+00 | loss scale: 32768.0 | grad norm: 93123.364 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3593/  159576 | consumed samples:        67728 | elapsed time per iteration (ms): 14970.8 | learning rate: 1.877E-05 | global batch size:    32 | lm loss: 6.605970E+00 | loss scale: 32768.0 | grad norm: 196733.888 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3594/  159576 | consumed samples:        67760 | elapsed time per iteration (ms): 14488.2 | learning rate: 1.878E-05 | global batch size:    32 | lm loss: 6.408032E+00 | loss scale: 32768.0 | grad norm: 119062.835 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3595/  159576 | consumed samples:        67792 | elapsed time per iteration (ms): 14589.0 | learning rate: 1.879E-05 | global batch size:    32 | lm loss: 6.434669E+00 | loss scale: 32768.0 | grad norm: 163713.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3596/  159576 | consumed samples:        67824 | elapsed time per iteration (ms): 14467.1 | learning rate: 1.879E-05 | global batch size:    32 | lm loss: 6.515763E+00 | loss scale: 32768.0 | grad norm: 123609.059 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3597/  159576 | consumed samples:        67856 | elapsed time per iteration (ms): 14918.0 | learning rate: 1.880E-05 | global batch size:    32 | lm loss: 6.473671E+00 | loss scale: 32768.0 | grad norm: 113241.499 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3598/  159576 | consumed samples:        67888 | elapsed time per iteration (ms): 14630.3 | learning rate: 1.881E-05 | global batch size:    32 | lm loss: 6.497471E+00 | loss scale: 32768.0 | grad norm: 180550.199 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3599/  159576 | consumed samples:        67920 | elapsed time per iteration (ms): 14523.9 | learning rate: 1.882E-05 | global batch size:    32 | lm loss: 6.665214E+00 | loss scale: 32768.0 | grad norm: 120833.867 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3600/  159576 | consumed samples:        67952 | elapsed time per iteration (ms): 14548.6 | learning rate: 1.883E-05 | global batch size:    32 | lm loss: 6.506467E+00 | loss scale: 32768.0 | grad norm: 124134.552 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3601/  159576 | consumed samples:        67984 | elapsed time per iteration (ms): 14576.2 | learning rate: 1.884E-05 | global batch size:    32 | lm loss: 6.491764E+00 | loss scale: 32768.0 | grad norm: 230059.443 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3602/  159576 | consumed samples:        68016 | elapsed time per iteration (ms): 14979.8 | learning rate: 1.885E-05 | global batch size:    32 | lm loss: 6.445697E+00 | loss scale: 32768.0 | grad norm: 125622.628 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3603/  159576 | consumed samples:        68048 | elapsed time per iteration (ms): 14453.6 | learning rate: 1.886E-05 | global batch size:    32 | lm loss: 6.613330E+00 | loss scale: 32768.0 | grad norm: 166344.814 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3604/  159576 | consumed samples:        68080 | elapsed time per iteration (ms): 14495.4 | learning rate: 1.887E-05 | global batch size:    32 | lm loss: 6.603212E+00 | loss scale: 32768.0 | grad norm: 93757.784 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3605/  159576 | consumed samples:        68112 | elapsed time per iteration (ms): 14542.0 | learning rate: 1.887E-05 | global batch size:    32 | lm loss: 6.342390E+00 | loss scale: 32768.0 | grad norm: 130006.029 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3606/  159576 | consumed samples:        68144 | elapsed time per iteration (ms): 14685.4 | learning rate: 1.888E-05 | global batch size:    32 | lm loss: 6.480408E+00 | loss scale: 32768.0 | grad norm: 106365.528 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3607/  159576 | consumed samples:        68176 | elapsed time per iteration (ms): 14517.9 | learning rate: 1.889E-05 | global batch size:    32 | lm loss: 6.591272E+00 | loss scale: 32768.0 | grad norm: 171235.897 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3608/  159576 | consumed samples:        68208 | elapsed time per iteration (ms): 14591.0 | learning rate: 1.890E-05 | global batch size:    32 | lm loss: 6.311239E+00 | loss scale: 32768.0 | grad norm: 126858.601 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3609/  159576 | consumed samples:        68240 | elapsed time per iteration (ms): 14549.9 | learning rate: 1.891E-05 | global batch size:    32 | lm loss: 6.395494E+00 | loss scale: 32768.0 | grad norm: 227345.632 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3610/  159576 | consumed samples:        68272 | elapsed time per iteration (ms): 14677.9 | learning rate: 1.892E-05 | global batch size:    32 | lm loss: 6.557859E+00 | loss scale: 32768.0 | grad norm: 116386.145 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3611/  159576 | consumed samples:        68304 | elapsed time per iteration (ms): 14497.7 | learning rate: 1.893E-05 | global batch size:    32 | lm loss: 6.436782E+00 | loss scale: 32768.0 | grad norm: 130216.388 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3612/  159576 | consumed samples:        68336 | elapsed time per iteration (ms): 14516.9 | learning rate: 1.894E-05 | global batch size:    32 | lm loss: 6.523721E+00 | loss scale: 32768.0 | grad norm: 153807.816 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3613/  159576 | consumed samples:        68368 | elapsed time per iteration (ms): 14537.1 | learning rate: 1.895E-05 | global batch size:    32 | lm loss: 6.480092E+00 | loss scale: 32768.0 | grad norm: 191977.060 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3614/  159576 | consumed samples:        68400 | elapsed time per iteration (ms): 14777.4 | learning rate: 1.895E-05 | global batch size:    32 | lm loss: 6.507137E+00 | loss scale: 32768.0 | grad norm: 147123.785 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3615/  159576 | consumed samples:        68432 | elapsed time per iteration (ms): 14631.8 | learning rate: 1.896E-05 | global batch size:    32 | lm loss: 6.413469E+00 | loss scale: 32768.0 | grad norm: 151298.616 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3616/  159576 | consumed samples:        68464 | elapsed time per iteration (ms): 14498.7 | learning rate: 1.897E-05 | global batch size:    32 | lm loss: 6.400654E+00 | loss scale: 32768.0 | grad norm: 144773.834 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3617/  159576 | consumed samples:        68496 | elapsed time per iteration (ms): 14516.2 | learning rate: 1.898E-05 | global batch size:    32 | lm loss: 6.514056E+00 | loss scale: 32768.0 | grad norm: 212184.973 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3618/  159576 | consumed samples:        68528 | elapsed time per iteration (ms): 15120.1 | learning rate: 1.899E-05 | global batch size:    32 | lm loss: 6.476982E+00 | loss scale: 32768.0 | grad norm: 138389.337 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3619/  159576 | consumed samples:        68560 | elapsed time per iteration (ms): 14520.5 | learning rate: 1.900E-05 | global batch size:    32 | lm loss: 6.413394E+00 | loss scale: 32768.0 | grad norm: 144757.897 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3620/  159576 | consumed samples:        68592 | elapsed time per iteration (ms): 14501.8 | learning rate: 1.901E-05 | global batch size:    32 | lm loss: 6.508588E+00 | loss scale: 32768.0 | grad norm: 119480.778 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3621/  159576 | consumed samples:        68624 | elapsed time per iteration (ms): 14544.3 | learning rate: 1.902E-05 | global batch size:    32 | lm loss: 6.462088E+00 | loss scale: 32768.0 | grad norm: 118576.762 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3622/  159576 | consumed samples:        68656 | elapsed time per iteration (ms): 14904.8 | learning rate: 1.903E-05 | global batch size:    32 | lm loss: 6.518481E+00 | loss scale: 32768.0 | grad norm: 166384.993 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3623/  159576 | consumed samples:        68688 | elapsed time per iteration (ms): 14536.7 | learning rate: 1.903E-05 | global batch size:    32 | lm loss: 6.418991E+00 | loss scale: 32768.0 | grad norm: 133937.631 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3624/  159576 | consumed samples:        68720 | elapsed time per iteration (ms): 14549.8 | learning rate: 1.904E-05 | global batch size:    32 | lm loss: 6.446878E+00 | loss scale: 32768.0 | grad norm: 270206.058 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3625/  159576 | consumed samples:        68752 | elapsed time per iteration (ms): 14599.2 | learning rate: 1.905E-05 | global batch size:    32 | lm loss: 6.534576E+00 | loss scale: 32768.0 | grad norm: 155344.465 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3626/  159576 | consumed samples:        68784 | elapsed time per iteration (ms): 14722.9 | learning rate: 1.906E-05 | global batch size:    32 | lm loss: 6.630429E+00 | loss scale: 32768.0 | grad norm: 199114.246 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3627/  159576 | consumed samples:        68816 | elapsed time per iteration (ms): 14500.1 | learning rate: 1.907E-05 | global batch size:    32 | lm loss: 6.356173E+00 | loss scale: 32768.0 | grad norm: 167282.135 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3628/  159576 | consumed samples:        68848 | elapsed time per iteration (ms): 14530.4 | learning rate: 1.908E-05 | global batch size:    32 | lm loss: 6.471046E+00 | loss scale: 32768.0 | grad norm: 208481.248 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3629/  159576 | consumed samples:        68880 | elapsed time per iteration (ms): 14549.1 | learning rate: 1.909E-05 | global batch size:    32 | lm loss: 6.412348E+00 | loss scale: 32768.0 | grad norm: 149105.571 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3630/  159576 | consumed samples:        68912 | elapsed time per iteration (ms): 14882.4 | learning rate: 1.910E-05 | global batch size:    32 | lm loss: 6.520298E+00 | loss scale: 32768.0 | grad norm: 123369.844 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3631/  159576 | consumed samples:        68944 | elapsed time per iteration (ms): 14575.6 | learning rate: 1.911E-05 | global batch size:    32 | lm loss: 6.558264E+00 | loss scale: 32768.0 | grad norm: 243133.943 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3632/  159576 | consumed samples:        68976 | elapsed time per iteration (ms): 14516.5 | learning rate: 1.911E-05 | global batch size:    32 | lm loss: 6.583918E+00 | loss scale: 32768.0 | grad norm: 178142.765 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3633/  159576 | consumed samples:        69008 | elapsed time per iteration (ms): 14471.4 | learning rate: 1.912E-05 | global batch size:    32 | lm loss: 6.540310E+00 | loss scale: 32768.0 | grad norm: 189782.276 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3634/  159576 | consumed samples:        69040 | elapsed time per iteration (ms): 14945.9 | learning rate: 1.913E-05 | global batch size:    32 | lm loss: 6.505736E+00 | loss scale: 32768.0 | grad norm: 165872.968 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3635/  159576 | consumed samples:        69072 | elapsed time per iteration (ms): 14539.5 | learning rate: 1.914E-05 | global batch size:    32 | lm loss: 6.509236E+00 | loss scale: 32768.0 | grad norm: 245470.953 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3636/  159576 | consumed samples:        69104 | elapsed time per iteration (ms): 14545.2 | learning rate: 1.915E-05 | global batch size:    32 | lm loss: 6.504992E+00 | loss scale: 32768.0 | grad norm: 150104.290 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3637/  159576 | consumed samples:        69136 | elapsed time per iteration (ms): 14567.6 | learning rate: 1.916E-05 | global batch size:    32 | lm loss: 6.406890E+00 | loss scale: 32768.0 | grad norm: 135913.146 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3638/  159576 | consumed samples:        69168 | elapsed time per iteration (ms): 14896.3 | learning rate: 1.917E-05 | global batch size:    32 | lm loss: 6.443694E+00 | loss scale: 32768.0 | grad norm: 185702.085 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3639/  159576 | consumed samples:        69200 | elapsed time per iteration (ms): 14591.0 | learning rate: 1.918E-05 | global batch size:    32 | lm loss: 6.556330E+00 | loss scale: 32768.0 | grad norm: 244123.289 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3640/  159576 | consumed samples:        69232 | elapsed time per iteration (ms): 14549.7 | learning rate: 1.918E-05 | global batch size:    32 | lm loss: 6.487778E+00 | loss scale: 32768.0 | grad norm: 177114.568 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3641/  159576 | consumed samples:        69264 | elapsed time per iteration (ms): 14570.7 | learning rate: 1.919E-05 | global batch size:    32 | lm loss: 6.513255E+00 | loss scale: 32768.0 | grad norm: 131694.234 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3642/  159576 | consumed samples:        69296 | elapsed time per iteration (ms): 14516.4 | learning rate: 1.920E-05 | global batch size:    32 | lm loss: 6.592026E+00 | loss scale: 32768.0 | grad norm: 290876.521 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3643/  159576 | consumed samples:        69328 | elapsed time per iteration (ms): 14756.7 | learning rate: 1.921E-05 | global batch size:    32 | lm loss: 6.662066E+00 | loss scale: 32768.0 | grad norm: 228974.687 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3644/  159576 | consumed samples:        69360 | elapsed time per iteration (ms): 14551.2 | learning rate: 1.922E-05 | global batch size:    32 | lm loss: 6.366663E+00 | loss scale: 32768.0 | grad norm: 161091.231 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3645/  159576 | consumed samples:        69392 | elapsed time per iteration (ms): 14619.9 | learning rate: 1.923E-05 | global batch size:    32 | lm loss: 6.523453E+00 | loss scale: 32768.0 | grad norm: 136622.848 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3646/  159576 | consumed samples:        69424 | elapsed time per iteration (ms): 14549.7 | learning rate: 1.924E-05 | global batch size:    32 | lm loss: 6.502388E+00 | loss scale: 32768.0 | grad norm: 233041.164 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3647/  159576 | consumed samples:        69456 | elapsed time per iteration (ms): 14639.6 | learning rate: 1.925E-05 | global batch size:    32 | lm loss: 6.570889E+00 | loss scale: 32768.0 | grad norm: 177700.635 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3648/  159576 | consumed samples:        69488 | elapsed time per iteration (ms): 14511.4 | learning rate: 1.926E-05 | global batch size:    32 | lm loss: 6.538668E+00 | loss scale: 32768.0 | grad norm: 167613.706 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3649/  159576 | consumed samples:        69520 | elapsed time per iteration (ms): 14499.6 | learning rate: 1.926E-05 | global batch size:    32 | lm loss: 6.650812E+00 | loss scale: 32768.0 | grad norm: 144019.361 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3650/  159576 | consumed samples:        69552 | elapsed time per iteration (ms): 14509.6 | learning rate: 1.927E-05 | global batch size:    32 | lm loss: 6.449777E+00 | loss scale: 32768.0 | grad norm: 190635.397 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3651/  159576 | consumed samples:        69584 | elapsed time per iteration (ms): 14775.5 | learning rate: 1.928E-05 | global batch size:    32 | lm loss: 6.435673E+00 | loss scale: 32768.0 | grad norm: 181537.989 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3652/  159576 | consumed samples:        69616 | elapsed time per iteration (ms): 14563.5 | learning rate: 1.929E-05 | global batch size:    32 | lm loss: 6.631623E+00 | loss scale: 32768.0 | grad norm: 150202.284 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3653/  159576 | consumed samples:        69648 | elapsed time per iteration (ms): 14524.8 | learning rate: 1.930E-05 | global batch size:    32 | lm loss: 6.612866E+00 | loss scale: 32768.0 | grad norm: 136863.545 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3654/  159576 | consumed samples:        69680 | elapsed time per iteration (ms): 14611.3 | learning rate: 1.931E-05 | global batch size:    32 | lm loss: 6.471664E+00 | loss scale: 32768.0 | grad norm: 177103.324 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3655/  159576 | consumed samples:        69712 | elapsed time per iteration (ms): 14752.9 | learning rate: 1.932E-05 | global batch size:    32 | lm loss: 6.436707E+00 | loss scale: 32768.0 | grad norm: 107210.342 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3656/  159576 | consumed samples:        69744 | elapsed time per iteration (ms): 14544.1 | learning rate: 1.933E-05 | global batch size:    32 | lm loss: 6.679466E+00 | loss scale: 32768.0 | grad norm: 156389.742 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3657/  159576 | consumed samples:        69776 | elapsed time per iteration (ms): 14560.9 | learning rate: 1.934E-05 | global batch size:    32 | lm loss: 6.478530E+00 | loss scale: 32768.0 | grad norm: 136151.461 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3658/  159576 | consumed samples:        69808 | elapsed time per iteration (ms): 14516.8 | learning rate: 1.934E-05 | global batch size:    32 | lm loss: 6.537941E+00 | loss scale: 32768.0 | grad norm: 169825.588 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3659/  159576 | consumed samples:        69840 | elapsed time per iteration (ms): 15041.8 | learning rate: 1.935E-05 | global batch size:    32 | lm loss: 6.414840E+00 | loss scale: 32768.0 | grad norm: 116305.156 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3660/  159576 | consumed samples:        69872 | elapsed time per iteration (ms): 14596.0 | learning rate: 1.936E-05 | global batch size:    32 | lm loss: 6.423607E+00 | loss scale: 32768.0 | grad norm: 157726.425 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3661/  159576 | consumed samples:        69904 | elapsed time per iteration (ms): 14600.4 | learning rate: 1.937E-05 | global batch size:    32 | lm loss: 6.516055E+00 | loss scale: 32768.0 | grad norm: 150170.125 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3662/  159576 | consumed samples:        69936 | elapsed time per iteration (ms): 14508.1 | learning rate: 1.938E-05 | global batch size:    32 | lm loss: 6.406610E+00 | loss scale: 32768.0 | grad norm: 180125.834 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3663/  159576 | consumed samples:        69968 | elapsed time per iteration (ms): 14795.2 | learning rate: 1.939E-05 | global batch size:    32 | lm loss: 6.495340E+00 | loss scale: 32768.0 | grad norm: 156226.253 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3664/  159576 | consumed samples:        70000 | elapsed time per iteration (ms): 14502.7 | learning rate: 1.940E-05 | global batch size:    32 | lm loss: 6.478324E+00 | loss scale: 32768.0 | grad norm: 139199.774 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3665/  159576 | consumed samples:        70032 | elapsed time per iteration (ms): 14521.4 | learning rate: 1.941E-05 | global batch size:    32 | lm loss: 6.486080E+00 | loss scale: 32768.0 | grad norm: 139987.206 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3666/  159576 | consumed samples:        70064 | elapsed time per iteration (ms): 14501.0 | learning rate: 1.942E-05 | global batch size:    32 | lm loss: 6.412463E+00 | loss scale: 32768.0 | grad norm: 187000.562 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3667/  159576 | consumed samples:        70096 | elapsed time per iteration (ms): 14907.7 | learning rate: 1.942E-05 | global batch size:    32 | lm loss: 6.555160E+00 | loss scale: 32768.0 | grad norm: 151236.383 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3668/  159576 | consumed samples:        70128 | elapsed time per iteration (ms): 14546.0 | learning rate: 1.943E-05 | global batch size:    32 | lm loss: 6.466833E+00 | loss scale: 32768.0 | grad norm: 188341.809 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3669/  159576 | consumed samples:        70160 | elapsed time per iteration (ms): 14504.0 | learning rate: 1.944E-05 | global batch size:    32 | lm loss: 6.512917E+00 | loss scale: 32768.0 | grad norm: 142898.213 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3670/  159576 | consumed samples:        70192 | elapsed time per iteration (ms): 14550.7 | learning rate: 1.945E-05 | global batch size:    32 | lm loss: 6.662933E+00 | loss scale: 32768.0 | grad norm: 155470.352 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3671/  159576 | consumed samples:        70224 | elapsed time per iteration (ms): 14892.4 | learning rate: 1.946E-05 | global batch size:    32 | lm loss: 6.373161E+00 | loss scale: 32768.0 | grad norm: 150042.585 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3672/  159576 | consumed samples:        70256 | elapsed time per iteration (ms): 14566.7 | learning rate: 1.947E-05 | global batch size:    32 | lm loss: 6.426474E+00 | loss scale: 32768.0 | grad norm: 170805.274 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3673/  159576 | consumed samples:        70288 | elapsed time per iteration (ms): 14501.7 | learning rate: 1.948E-05 | global batch size:    32 | lm loss: 6.370544E+00 | loss scale: 32768.0 | grad norm: 138493.754 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3674/  159576 | consumed samples:        70320 | elapsed time per iteration (ms): 14600.9 | learning rate: 1.949E-05 | global batch size:    32 | lm loss: 6.383911E+00 | loss scale: 32768.0 | grad norm: 137200.588 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3675/  159576 | consumed samples:        70352 | elapsed time per iteration (ms): 14904.3 | learning rate: 1.950E-05 | global batch size:    32 | lm loss: 6.430146E+00 | loss scale: 32768.0 | grad norm: 130856.844 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3676/  159576 | consumed samples:        70384 | elapsed time per iteration (ms): 14544.1 | learning rate: 1.950E-05 | global batch size:    32 | lm loss: 6.359234E+00 | loss scale: 32768.0 | grad norm: 123290.267 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3677/  159576 | consumed samples:        70416 | elapsed time per iteration (ms): 14660.6 | learning rate: 1.951E-05 | global batch size:    32 | lm loss: 6.340640E+00 | loss scale: 32768.0 | grad norm: 128445.878 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3678/  159576 | consumed samples:        70448 | elapsed time per iteration (ms): 14469.4 | learning rate: 1.952E-05 | global batch size:    32 | lm loss: 6.467716E+00 | loss scale: 32768.0 | grad norm: 222732.002 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3679/  159576 | consumed samples:        70480 | elapsed time per iteration (ms): 14540.6 | learning rate: 1.953E-05 | global batch size:    32 | lm loss: 6.401999E+00 | loss scale: 32768.0 | grad norm: 143732.695 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3680/  159576 | consumed samples:        70512 | elapsed time per iteration (ms): 14837.8 | learning rate: 1.954E-05 | global batch size:    32 | lm loss: 6.469200E+00 | loss scale: 32768.0 | grad norm: 148617.864 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3681/  159576 | consumed samples:        70544 | elapsed time per iteration (ms): 14560.6 | learning rate: 1.955E-05 | global batch size:    32 | lm loss: 6.503996E+00 | loss scale: 32768.0 | grad norm: 151584.667 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3682/  159576 | consumed samples:        70576 | elapsed time per iteration (ms): 14533.4 | learning rate: 1.956E-05 | global batch size:    32 | lm loss: 6.473675E+00 | loss scale: 32768.0 | grad norm: 171148.885 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3683/  159576 | consumed samples:        70608 | elapsed time per iteration (ms): 14606.7 | learning rate: 1.957E-05 | global batch size:    32 | lm loss: 6.406356E+00 | loss scale: 32768.0 | grad norm: 139281.574 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3684/  159576 | consumed samples:        70640 | elapsed time per iteration (ms): 14772.8 | learning rate: 1.958E-05 | global batch size:    32 | lm loss: 6.329139E+00 | loss scale: 32768.0 | grad norm: 108055.730 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3685/  159576 | consumed samples:        70672 | elapsed time per iteration (ms): 14518.6 | learning rate: 1.958E-05 | global batch size:    32 | lm loss: 6.525671E+00 | loss scale: 32768.0 | grad norm: 204684.374 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3686/  159576 | consumed samples:        70704 | elapsed time per iteration (ms): 14569.3 | learning rate: 1.959E-05 | global batch size:    32 | lm loss: 6.454522E+00 | loss scale: 32768.0 | grad norm: 108450.408 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3687/  159576 | consumed samples:        70736 | elapsed time per iteration (ms): 14527.9 | learning rate: 1.960E-05 | global batch size:    32 | lm loss: 6.452621E+00 | loss scale: 32768.0 | grad norm: 154981.336 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3688/  159576 | consumed samples:        70768 | elapsed time per iteration (ms): 14681.9 | learning rate: 1.961E-05 | global batch size:    32 | lm loss: 6.485929E+00 | loss scale: 32768.0 | grad norm: 132389.054 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3689/  159576 | consumed samples:        70800 | elapsed time per iteration (ms): 14628.9 | learning rate: 1.962E-05 | global batch size:    32 | lm loss: 6.560607E+00 | loss scale: 32768.0 | grad norm: 244618.808 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3690/  159576 | consumed samples:        70832 | elapsed time per iteration (ms): 14570.6 | learning rate: 1.963E-05 | global batch size:    32 | lm loss: 6.545405E+00 | loss scale: 32768.0 | grad norm: 207471.493 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3691/  159576 | consumed samples:        70864 | elapsed time per iteration (ms): 14568.4 | learning rate: 1.964E-05 | global batch size:    32 | lm loss: 6.403141E+00 | loss scale: 32768.0 | grad norm: 160751.609 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3692/  159576 | consumed samples:        70896 | elapsed time per iteration (ms): 14828.9 | learning rate: 1.965E-05 | global batch size:    32 | lm loss: 6.494320E+00 | loss scale: 32768.0 | grad norm: 142715.734 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3693/  159576 | consumed samples:        70928 | elapsed time per iteration (ms): 14576.4 | learning rate: 1.966E-05 | global batch size:    32 | lm loss: 6.317194E+00 | loss scale: 32768.0 | grad norm: 218725.519 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3694/  159576 | consumed samples:        70960 | elapsed time per iteration (ms): 14558.1 | learning rate: 1.966E-05 | global batch size:    32 | lm loss: 6.404289E+00 | loss scale: 32768.0 | grad norm: 133735.905 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3695/  159576 | consumed samples:        70992 | elapsed time per iteration (ms): 14502.5 | learning rate: 1.967E-05 | global batch size:    32 | lm loss: 6.501413E+00 | loss scale: 32768.0 | grad norm: 126881.621 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3696/  159576 | consumed samples:        71024 | elapsed time per iteration (ms): 14876.1 | learning rate: 1.968E-05 | global batch size:    32 | lm loss: 6.348512E+00 | loss scale: 32768.0 | grad norm: 117844.529 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3697/  159576 | consumed samples:        71056 | elapsed time per iteration (ms): 14704.7 | learning rate: 1.969E-05 | global batch size:    32 | lm loss: 6.490881E+00 | loss scale: 32768.0 | grad norm: 191050.826 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3698/  159576 | consumed samples:        71088 | elapsed time per iteration (ms): 14521.5 | learning rate: 1.970E-05 | global batch size:    32 | lm loss: 6.399506E+00 | loss scale: 32768.0 | grad norm: 131579.663 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3699/  159576 | consumed samples:        71120 | elapsed time per iteration (ms): 14570.1 | learning rate: 1.971E-05 | global batch size:    32 | lm loss: 6.507861E+00 | loss scale: 32768.0 | grad norm: 124970.942 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3700/  159576 | consumed samples:        71152 | elapsed time per iteration (ms): 15037.4 | learning rate: 1.972E-05 | global batch size:    32 | lm loss: 6.460707E+00 | loss scale: 32768.0 | grad norm: 163864.847 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3701/  159576 | consumed samples:        71184 | elapsed time per iteration (ms): 14616.1 | learning rate: 1.973E-05 | global batch size:    32 | lm loss: 6.410345E+00 | loss scale: 32768.0 | grad norm: 155995.488 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3702/  159576 | consumed samples:        71216 | elapsed time per iteration (ms): 14555.1 | learning rate: 1.974E-05 | global batch size:    32 | lm loss: 6.418409E+00 | loss scale: 32768.0 | grad norm: 135398.679 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3703/  159576 | consumed samples:        71248 | elapsed time per iteration (ms): 14529.9 | learning rate: 1.974E-05 | global batch size:    32 | lm loss: 6.445669E+00 | loss scale: 32768.0 | grad norm: 149575.193 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3704/  159576 | consumed samples:        71280 | elapsed time per iteration (ms): 14938.6 | learning rate: 1.975E-05 | global batch size:    32 | lm loss: 6.466682E+00 | loss scale: 32768.0 | grad norm: 158480.859 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3705/  159576 | consumed samples:        71312 | elapsed time per iteration (ms): 14501.2 | learning rate: 1.976E-05 | global batch size:    32 | lm loss: 6.391745E+00 | loss scale: 32768.0 | grad norm: 130405.062 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3706/  159576 | consumed samples:        71344 | elapsed time per iteration (ms): 14560.8 | learning rate: 1.977E-05 | global batch size:    32 | lm loss: 6.367959E+00 | loss scale: 32768.0 | grad norm: 134894.924 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3707/  159576 | consumed samples:        71376 | elapsed time per iteration (ms): 14606.1 | learning rate: 1.978E-05 | global batch size:    32 | lm loss: 6.568520E+00 | loss scale: 32768.0 | grad norm: 127252.532 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3708/  159576 | consumed samples:        71408 | elapsed time per iteration (ms): 14831.0 | learning rate: 1.979E-05 | global batch size:    32 | lm loss: 6.451063E+00 | loss scale: 32768.0 | grad norm: 352497.893 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3709/  159576 | consumed samples:        71440 | elapsed time per iteration (ms): 14547.0 | learning rate: 1.980E-05 | global batch size:    32 | lm loss: 6.534979E+00 | loss scale: 32768.0 | grad norm: 139565.555 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3710/  159576 | consumed samples:        71472 | elapsed time per iteration (ms): 14583.9 | learning rate: 1.981E-05 | global batch size:    32 | lm loss: 6.561714E+00 | loss scale: 32768.0 | grad norm: 190647.531 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3711/  159576 | consumed samples:        71504 | elapsed time per iteration (ms): 14605.2 | learning rate: 1.982E-05 | global batch size:    32 | lm loss: 6.594619E+00 | loss scale: 32768.0 | grad norm: 159179.628 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3712/  159576 | consumed samples:        71536 | elapsed time per iteration (ms): 14853.8 | learning rate: 1.982E-05 | global batch size:    32 | lm loss: 6.221584E+00 | loss scale: 32768.0 | grad norm: 163662.318 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3713/  159576 | consumed samples:        71568 | elapsed time per iteration (ms): 14625.6 | learning rate: 1.983E-05 | global batch size:    32 | lm loss: 6.384083E+00 | loss scale: 32768.0 | grad norm: 157426.857 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3714/  159576 | consumed samples:        71600 | elapsed time per iteration (ms): 14617.1 | learning rate: 1.984E-05 | global batch size:    32 | lm loss: 6.457389E+00 | loss scale: 32768.0 | grad norm: 163827.138 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3715/  159576 | consumed samples:        71632 | elapsed time per iteration (ms): 14519.7 | learning rate: 1.985E-05 | global batch size:    32 | lm loss: 6.461262E+00 | loss scale: 32768.0 | grad norm: 150641.403 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3716/  159576 | consumed samples:        71664 | elapsed time per iteration (ms): 14921.5 | learning rate: 1.986E-05 | global batch size:    32 | lm loss: 6.345608E+00 | loss scale: 32768.0 | grad norm: 146728.063 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3717/  159576 | consumed samples:        71696 | elapsed time per iteration (ms): 14643.5 | learning rate: 1.987E-05 | global batch size:    32 | lm loss: 6.488680E+00 | loss scale: 32768.0 | grad norm: 159547.980 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3718/  159576 | consumed samples:        71728 | elapsed time per iteration (ms): 14531.6 | learning rate: 1.988E-05 | global batch size:    32 | lm loss: 6.358843E+00 | loss scale: 32768.0 | grad norm: 120331.967 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3719/  159576 | consumed samples:        71760 | elapsed time per iteration (ms): 14544.0 | learning rate: 1.989E-05 | global batch size:    32 | lm loss: 6.480108E+00 | loss scale: 32768.0 | grad norm: 136903.050 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3720/  159576 | consumed samples:        71792 | elapsed time per iteration (ms): 14789.8 | learning rate: 1.989E-05 | global batch size:    32 | lm loss: 6.423407E+00 | loss scale: 32768.0 | grad norm: 144666.737 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3721/  159576 | consumed samples:        71824 | elapsed time per iteration (ms): 14759.3 | learning rate: 1.990E-05 | global batch size:    32 | lm loss: 6.280478E+00 | loss scale: 32768.0 | grad norm: 131505.636 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3722/  159576 | consumed samples:        71856 | elapsed time per iteration (ms): 14493.1 | learning rate: 1.991E-05 | global batch size:    32 | lm loss: 6.341520E+00 | loss scale: 32768.0 | grad norm: 153861.927 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3723/  159576 | consumed samples:        71888 | elapsed time per iteration (ms): 14523.6 | learning rate: 1.992E-05 | global batch size:    32 | lm loss: 6.470270E+00 | loss scale: 32768.0 | grad norm: 129755.757 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3724/  159576 | consumed samples:        71920 | elapsed time per iteration (ms): 14486.1 | learning rate: 1.993E-05 | global batch size:    32 | lm loss: 6.425168E+00 | loss scale: 32768.0 | grad norm: 117324.517 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3725/  159576 | consumed samples:        71952 | elapsed time per iteration (ms): 14760.5 | learning rate: 1.994E-05 | global batch size:    32 | lm loss: 6.508280E+00 | loss scale: 32768.0 | grad norm: 128492.118 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3726/  159576 | consumed samples:        71984 | elapsed time per iteration (ms): 14523.7 | learning rate: 1.995E-05 | global batch size:    32 | lm loss: 6.451111E+00 | loss scale: 32768.0 | grad norm: 167230.725 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3727/  159576 | consumed samples:        72016 | elapsed time per iteration (ms): 14569.3 | learning rate: 1.996E-05 | global batch size:    32 | lm loss: 6.428119E+00 | loss scale: 32768.0 | grad norm: 118648.215 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3728/  159576 | consumed samples:        72048 | elapsed time per iteration (ms): 14495.2 | learning rate: 1.997E-05 | global batch size:    32 | lm loss: 6.472005E+00 | loss scale: 32768.0 | grad norm: 129074.469 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3729/  159576 | consumed samples:        72080 | elapsed time per iteration (ms): 14750.9 | learning rate: 1.997E-05 | global batch size:    32 | lm loss: 6.501527E+00 | loss scale: 32768.0 | grad norm: 149114.403 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3730/  159576 | consumed samples:        72112 | elapsed time per iteration (ms): 14542.0 | learning rate: 1.998E-05 | global batch size:    32 | lm loss: 6.441484E+00 | loss scale: 32768.0 | grad norm: 115103.080 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3731/  159576 | consumed samples:        72144 | elapsed time per iteration (ms): 14563.9 | learning rate: 1.999E-05 | global batch size:    32 | lm loss: 6.365570E+00 | loss scale: 32768.0 | grad norm: 122866.924 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3732/  159576 | consumed samples:        72176 | elapsed time per iteration (ms): 14514.0 | learning rate: 2.000E-05 | global batch size:    32 | lm loss: 6.432354E+00 | loss scale: 32768.0 | grad norm: 117503.601 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3733/  159576 | consumed samples:        72208 | elapsed time per iteration (ms): 14782.6 | learning rate: 2.001E-05 | global batch size:    32 | lm loss: 6.406446E+00 | loss scale: 32768.0 | grad norm: 118771.624 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3734/  159576 | consumed samples:        72240 | elapsed time per iteration (ms): 14599.5 | learning rate: 2.002E-05 | global batch size:    32 | lm loss: 6.564467E+00 | loss scale: 32768.0 | grad norm: 113605.510 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3735/  159576 | consumed samples:        72272 | elapsed time per iteration (ms): 14490.9 | learning rate: 2.003E-05 | global batch size:    32 | lm loss: 6.709463E+00 | loss scale: 32768.0 | grad norm: 143048.505 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3736/  159576 | consumed samples:        72304 | elapsed time per iteration (ms): 14616.2 | learning rate: 2.004E-05 | global batch size:    32 | lm loss: 6.388952E+00 | loss scale: 32768.0 | grad norm: 148752.246 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3737/  159576 | consumed samples:        72336 | elapsed time per iteration (ms): 14690.4 | learning rate: 2.005E-05 | global batch size:    32 | lm loss: 6.671305E+00 | loss scale: 32768.0 | grad norm: 167080.674 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3738/  159576 | consumed samples:        72368 | elapsed time per iteration (ms): 14577.2 | learning rate: 2.005E-05 | global batch size:    32 | lm loss: 6.441625E+00 | loss scale: 32768.0 | grad norm: 132744.798 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3739/  159576 | consumed samples:        72400 | elapsed time per iteration (ms): 14526.3 | learning rate: 2.006E-05 | global batch size:    32 | lm loss: 6.382997E+00 | loss scale: 32768.0 | grad norm: 137597.004 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3740/  159576 | consumed samples:        72432 | elapsed time per iteration (ms): 14497.0 | learning rate: 2.007E-05 | global batch size:    32 | lm loss: 6.423009E+00 | loss scale: 32768.0 | grad norm: 158026.136 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3741/  159576 | consumed samples:        72464 | elapsed time per iteration (ms): 14972.2 | learning rate: 2.008E-05 | global batch size:    32 | lm loss: 6.350714E+00 | loss scale: 32768.0 | grad norm: 133556.244 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3742/  159576 | consumed samples:        72496 | elapsed time per iteration (ms): 14524.0 | learning rate: 2.009E-05 | global batch size:    32 | lm loss: 6.481720E+00 | loss scale: 32768.0 | grad norm: 111295.642 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3743/  159576 | consumed samples:        72528 | elapsed time per iteration (ms): 14585.5 | learning rate: 2.010E-05 | global batch size:    32 | lm loss: 6.427812E+00 | loss scale: 32768.0 | grad norm: 147125.472 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3744/  159576 | consumed samples:        72560 | elapsed time per iteration (ms): 14494.4 | learning rate: 2.011E-05 | global batch size:    32 | lm loss: 6.548944E+00 | loss scale: 32768.0 | grad norm: 157070.428 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3745/  159576 | consumed samples:        72592 | elapsed time per iteration (ms): 14860.3 | learning rate: 2.012E-05 | global batch size:    32 | lm loss: 6.524699E+00 | loss scale: 32768.0 | grad norm: 133650.321 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3746/  159576 | consumed samples:        72624 | elapsed time per iteration (ms): 14524.8 | learning rate: 2.013E-05 | global batch size:    32 | lm loss: 6.462801E+00 | loss scale: 32768.0 | grad norm: 145785.393 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3747/  159576 | consumed samples:        72656 | elapsed time per iteration (ms): 14508.2 | learning rate: 2.013E-05 | global batch size:    32 | lm loss: 6.505124E+00 | loss scale: 32768.0 | grad norm: 159039.833 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3748/  159576 | consumed samples:        72688 | elapsed time per iteration (ms): 14534.8 | learning rate: 2.014E-05 | global batch size:    32 | lm loss: 6.554813E+00 | loss scale: 32768.0 | grad norm: 144107.066 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3749/  159576 | consumed samples:        72720 | elapsed time per iteration (ms): 14885.2 | learning rate: 2.015E-05 | global batch size:    32 | lm loss: 6.509037E+00 | loss scale: 32768.0 | grad norm: 139312.960 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3750/  159576 | consumed samples:        72752 | elapsed time per iteration (ms): 14531.0 | learning rate: 2.016E-05 | global batch size:    32 | lm loss: 6.393044E+00 | loss scale: 32768.0 | grad norm: 177829.341 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3751/  159576 | consumed samples:        72784 | elapsed time per iteration (ms): 14500.7 | learning rate: 2.017E-05 | global batch size:    32 | lm loss: 6.362189E+00 | loss scale: 32768.0 | grad norm: 176679.914 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3752/  159576 | consumed samples:        72816 | elapsed time per iteration (ms): 14533.8 | learning rate: 2.018E-05 | global batch size:    32 | lm loss: 6.594802E+00 | loss scale: 32768.0 | grad norm: 172136.738 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3753/  159576 | consumed samples:        72848 | elapsed time per iteration (ms): 7743.9 | learning rate: 2.018E-05 | global batch size:    32 | lm loss: 6.535247E+00 | loss scale: 32768.0 | grad norm: 172136.738 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3754/  159576 | consumed samples:        72880 | elapsed time per iteration (ms): 14383.1 | learning rate: 2.019E-05 | global batch size:    32 | lm loss: 6.354399E+00 | loss scale: 32768.0 | grad norm: 126648.407 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3755/  159576 | consumed samples:        72912 | elapsed time per iteration (ms): 14590.3 | learning rate: 2.020E-05 | global batch size:    32 | lm loss: 6.473662E+00 | loss scale: 32768.0 | grad norm: 156295.152 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3756/  159576 | consumed samples:        72944 | elapsed time per iteration (ms): 7767.7 | learning rate: 2.020E-05 | global batch size:    32 | lm loss: 6.609807E+00 | loss scale: 16384.0 | grad norm: 156295.152 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3757/  159576 | consumed samples:        72976 | elapsed time per iteration (ms): 14046.4 | learning rate: 2.021E-05 | global batch size:    32 | lm loss: 6.389218E+00 | loss scale: 16384.0 | grad norm: 71738.658 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3758/  159576 | consumed samples:        73008 | elapsed time per iteration (ms): 14805.7 | learning rate: 2.021E-05 | global batch size:    32 | lm loss: 6.361919E+00 | loss scale: 16384.0 | grad norm: 60700.631 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3759/  159576 | consumed samples:        73040 | elapsed time per iteration (ms): 14722.8 | learning rate: 2.022E-05 | global batch size:    32 | lm loss: 6.447733E+00 | loss scale: 16384.0 | grad norm: 87663.180 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3760/  159576 | consumed samples:        73072 | elapsed time per iteration (ms): 14583.0 | learning rate: 2.023E-05 | global batch size:    32 | lm loss: 6.446470E+00 | loss scale: 16384.0 | grad norm: 67781.743 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3761/  159576 | consumed samples:        73104 | elapsed time per iteration (ms): 14493.9 | learning rate: 2.024E-05 | global batch size:    32 | lm loss: 6.378415E+00 | loss scale: 16384.0 | grad norm: 72177.747 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3762/  159576 | consumed samples:        73136 | elapsed time per iteration (ms): 14567.8 | learning rate: 2.025E-05 | global batch size:    32 | lm loss: 6.576702E+00 | loss scale: 16384.0 | grad norm: 87501.793 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3763/  159576 | consumed samples:        73168 | elapsed time per iteration (ms): 14732.6 | learning rate: 2.026E-05 | global batch size:    32 | lm loss: 6.522850E+00 | loss scale: 16384.0 | grad norm: 66784.841 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3764/  159576 | consumed samples:        73200 | elapsed time per iteration (ms): 14572.5 | learning rate: 2.027E-05 | global batch size:    32 | lm loss: 6.361198E+00 | loss scale: 16384.0 | grad norm: 85761.754 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3765/  159576 | consumed samples:        73232 | elapsed time per iteration (ms): 14647.5 | learning rate: 2.028E-05 | global batch size:    32 | lm loss: 6.605127E+00 | loss scale: 16384.0 | grad norm: 69863.144 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3766/  159576 | consumed samples:        73264 | elapsed time per iteration (ms): 14606.0 | learning rate: 2.029E-05 | global batch size:    32 | lm loss: 6.398610E+00 | loss scale: 16384.0 | grad norm: 94809.931 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3767/  159576 | consumed samples:        73296 | elapsed time per iteration (ms): 14708.7 | learning rate: 2.029E-05 | global batch size:    32 | lm loss: 6.484084E+00 | loss scale: 16384.0 | grad norm: 74741.244 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3768/  159576 | consumed samples:        73328 | elapsed time per iteration (ms): 14555.4 | learning rate: 2.030E-05 | global batch size:    32 | lm loss: 6.496735E+00 | loss scale: 16384.0 | grad norm: 77000.443 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3769/  159576 | consumed samples:        73360 | elapsed time per iteration (ms): 14556.9 | learning rate: 2.031E-05 | global batch size:    32 | lm loss: 6.386226E+00 | loss scale: 16384.0 | grad norm: 92155.881 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3770/  159576 | consumed samples:        73392 | elapsed time per iteration (ms): 14623.6 | learning rate: 2.032E-05 | global batch size:    32 | lm loss: 6.446381E+00 | loss scale: 16384.0 | grad norm: 91554.158 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3771/  159576 | consumed samples:        73424 | elapsed time per iteration (ms): 14736.8 | learning rate: 2.033E-05 | global batch size:    32 | lm loss: 6.477424E+00 | loss scale: 16384.0 | grad norm: 79287.956 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3772/  159576 | consumed samples:        73456 | elapsed time per iteration (ms): 14586.8 | learning rate: 2.034E-05 | global batch size:    32 | lm loss: 6.505037E+00 | loss scale: 16384.0 | grad norm: 76395.186 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3773/  159576 | consumed samples:        73488 | elapsed time per iteration (ms): 14638.2 | learning rate: 2.035E-05 | global batch size:    32 | lm loss: 6.536213E+00 | loss scale: 16384.0 | grad norm: 64411.593 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3774/  159576 | consumed samples:        73520 | elapsed time per iteration (ms): 14533.1 | learning rate: 2.036E-05 | global batch size:    32 | lm loss: 6.477271E+00 | loss scale: 16384.0 | grad norm: 79531.325 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3775/  159576 | consumed samples:        73552 | elapsed time per iteration (ms): 14956.5 | learning rate: 2.037E-05 | global batch size:    32 | lm loss: 6.364020E+00 | loss scale: 16384.0 | grad norm: 72312.067 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3776/  159576 | consumed samples:        73584 | elapsed time per iteration (ms): 14572.0 | learning rate: 2.037E-05 | global batch size:    32 | lm loss: 6.331044E+00 | loss scale: 16384.0 | grad norm: 84164.363 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3777/  159576 | consumed samples:        73616 | elapsed time per iteration (ms): 14594.9 | learning rate: 2.038E-05 | global batch size:    32 | lm loss: 6.512950E+00 | loss scale: 16384.0 | grad norm: 77822.381 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3778/  159576 | consumed samples:        73648 | elapsed time per iteration (ms): 14607.5 | learning rate: 2.039E-05 | global batch size:    32 | lm loss: 6.549839E+00 | loss scale: 16384.0 | grad norm: 66443.545 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3779/  159576 | consumed samples:        73680 | elapsed time per iteration (ms): 14999.4 | learning rate: 2.040E-05 | global batch size:    32 | lm loss: 6.475536E+00 | loss scale: 16384.0 | grad norm: 88572.452 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3780/  159576 | consumed samples:        73712 | elapsed time per iteration (ms): 14681.3 | learning rate: 2.041E-05 | global batch size:    32 | lm loss: 6.548042E+00 | loss scale: 16384.0 | grad norm: 74648.598 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3781/  159576 | consumed samples:        73744 | elapsed time per iteration (ms): 14610.5 | learning rate: 2.042E-05 | global batch size:    32 | lm loss: 6.445394E+00 | loss scale: 16384.0 | grad norm: 79663.249 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3782/  159576 | consumed samples:        73776 | elapsed time per iteration (ms): 14624.0 | learning rate: 2.043E-05 | global batch size:    32 | lm loss: 6.496744E+00 | loss scale: 16384.0 | grad norm: 77740.129 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3783/  159576 | consumed samples:        73808 | elapsed time per iteration (ms): 15155.7 | learning rate: 2.044E-05 | global batch size:    32 | lm loss: 6.402834E+00 | loss scale: 16384.0 | grad norm: 74857.589 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3784/  159576 | consumed samples:        73840 | elapsed time per iteration (ms): 14584.9 | learning rate: 2.045E-05 | global batch size:    32 | lm loss: 6.375038E+00 | loss scale: 16384.0 | grad norm: 86117.345 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3785/  159576 | consumed samples:        73872 | elapsed time per iteration (ms): 14634.8 | learning rate: 2.045E-05 | global batch size:    32 | lm loss: 6.507965E+00 | loss scale: 16384.0 | grad norm: 78691.029 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3786/  159576 | consumed samples:        73904 | elapsed time per iteration (ms): 14635.7 | learning rate: 2.046E-05 | global batch size:    32 | lm loss: 6.375463E+00 | loss scale: 16384.0 | grad norm: 105222.970 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3787/  159576 | consumed samples:        73936 | elapsed time per iteration (ms): 14981.3 | learning rate: 2.047E-05 | global batch size:    32 | lm loss: 6.494486E+00 | loss scale: 16384.0 | grad norm: 70745.031 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3788/  159576 | consumed samples:        73968 | elapsed time per iteration (ms): 14576.6 | learning rate: 2.048E-05 | global batch size:    32 | lm loss: 6.350873E+00 | loss scale: 16384.0 | grad norm: 81350.508 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3789/  159576 | consumed samples:        74000 | elapsed time per iteration (ms): 14674.5 | learning rate: 2.049E-05 | global batch size:    32 | lm loss: 6.467069E+00 | loss scale: 16384.0 | grad norm: 84086.046 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3790/  159576 | consumed samples:        74032 | elapsed time per iteration (ms): 14585.2 | learning rate: 2.050E-05 | global batch size:    32 | lm loss: 6.420381E+00 | loss scale: 16384.0 | grad norm: 79517.176 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3791/  159576 | consumed samples:        74064 | elapsed time per iteration (ms): 14845.4 | learning rate: 2.051E-05 | global batch size:    32 | lm loss: 6.528859E+00 | loss scale: 16384.0 | grad norm: 87747.432 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3792/  159576 | consumed samples:        74096 | elapsed time per iteration (ms): 14671.9 | learning rate: 2.052E-05 | global batch size:    32 | lm loss: 6.445452E+00 | loss scale: 16384.0 | grad norm: 76185.902 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3793/  159576 | consumed samples:        74128 | elapsed time per iteration (ms): 14614.2 | learning rate: 2.053E-05 | global batch size:    32 | lm loss: 6.579043E+00 | loss scale: 16384.0 | grad norm: 85891.467 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3794/  159576 | consumed samples:        74160 | elapsed time per iteration (ms): 14636.7 | learning rate: 2.053E-05 | global batch size:    32 | lm loss: 6.481782E+00 | loss scale: 16384.0 | grad norm: 62633.733 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3795/  159576 | consumed samples:        74192 | elapsed time per iteration (ms): 14963.5 | learning rate: 2.054E-05 | global batch size:    32 | lm loss: 6.517486E+00 | loss scale: 16384.0 | grad norm: 67403.184 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3796/  159576 | consumed samples:        74224 | elapsed time per iteration (ms): 14620.1 | learning rate: 2.055E-05 | global batch size:    32 | lm loss: 6.417095E+00 | loss scale: 16384.0 | grad norm: 62157.167 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3797/  159576 | consumed samples:        74256 | elapsed time per iteration (ms): 14620.8 | learning rate: 2.056E-05 | global batch size:    32 | lm loss: 6.419306E+00 | loss scale: 16384.0 | grad norm: 73456.568 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3798/  159576 | consumed samples:        74288 | elapsed time per iteration (ms): 14577.9 | learning rate: 2.057E-05 | global batch size:    32 | lm loss: 6.487021E+00 | loss scale: 16384.0 | grad norm: 67613.601 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3799/  159576 | consumed samples:        74320 | elapsed time per iteration (ms): 14963.8 | learning rate: 2.058E-05 | global batch size:    32 | lm loss: 6.459682E+00 | loss scale: 16384.0 | grad norm: 73515.295 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3800/  159576 | consumed samples:        74352 | elapsed time per iteration (ms): 14567.9 | learning rate: 2.059E-05 | global batch size:    32 | lm loss: 6.321566E+00 | loss scale: 16384.0 | grad norm: 77546.231 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3801/  159576 | consumed samples:        74384 | elapsed time per iteration (ms): 14600.7 | learning rate: 2.060E-05 | global batch size:    32 | lm loss: 6.582398E+00 | loss scale: 16384.0 | grad norm: 78424.143 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3802/  159576 | consumed samples:        74416 | elapsed time per iteration (ms): 14644.4 | learning rate: 2.061E-05 | global batch size:    32 | lm loss: 6.394701E+00 | loss scale: 16384.0 | grad norm: 82174.617 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3803/  159576 | consumed samples:        74448 | elapsed time per iteration (ms): 14905.7 | learning rate: 2.061E-05 | global batch size:    32 | lm loss: 6.388845E+00 | loss scale: 16384.0 | grad norm: 67050.595 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3804/  159576 | consumed samples:        74480 | elapsed time per iteration (ms): 14636.0 | learning rate: 2.062E-05 | global batch size:    32 | lm loss: 6.513092E+00 | loss scale: 16384.0 | grad norm: 118423.488 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3805/  159576 | consumed samples:        74512 | elapsed time per iteration (ms): 14511.9 | learning rate: 2.063E-05 | global batch size:    32 | lm loss: 6.418696E+00 | loss scale: 16384.0 | grad norm: 71096.098 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3806/  159576 | consumed samples:        74544 | elapsed time per iteration (ms): 14523.9 | learning rate: 2.064E-05 | global batch size:    32 | lm loss: 6.286570E+00 | loss scale: 16384.0 | grad norm: 93004.901 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3807/  159576 | consumed samples:        74576 | elapsed time per iteration (ms): 14509.8 | learning rate: 2.065E-05 | global batch size:    32 | lm loss: 6.565314E+00 | loss scale: 16384.0 | grad norm: 76207.242 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3808/  159576 | consumed samples:        74608 | elapsed time per iteration (ms): 15001.7 | learning rate: 2.066E-05 | global batch size:    32 | lm loss: 6.597963E+00 | loss scale: 16384.0 | grad norm: 136405.382 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3809/  159576 | consumed samples:        74640 | elapsed time per iteration (ms): 14540.5 | learning rate: 2.067E-05 | global batch size:    32 | lm loss: 6.619783E+00 | loss scale: 16384.0 | grad norm: 75270.102 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3810/  159576 | consumed samples:        74672 | elapsed time per iteration (ms): 14582.3 | learning rate: 2.068E-05 | global batch size:    32 | lm loss: 6.406981E+00 | loss scale: 16384.0 | grad norm: 81052.948 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3811/  159576 | consumed samples:        74704 | elapsed time per iteration (ms): 14512.1 | learning rate: 2.068E-05 | global batch size:    32 | lm loss: 6.487488E+00 | loss scale: 16384.0 | grad norm: 87400.939 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3812/  159576 | consumed samples:        74736 | elapsed time per iteration (ms): 14767.4 | learning rate: 2.069E-05 | global batch size:    32 | lm loss: 6.416305E+00 | loss scale: 16384.0 | grad norm: 104809.852 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3813/  159576 | consumed samples:        74768 | elapsed time per iteration (ms): 14457.6 | learning rate: 2.070E-05 | global batch size:    32 | lm loss: 6.405777E+00 | loss scale: 16384.0 | grad norm: 79282.350 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3814/  159576 | consumed samples:        74800 | elapsed time per iteration (ms): 14520.7 | learning rate: 2.071E-05 | global batch size:    32 | lm loss: 6.435395E+00 | loss scale: 16384.0 | grad norm: 75788.929 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3815/  159576 | consumed samples:        74832 | elapsed time per iteration (ms): 14520.3 | learning rate: 2.072E-05 | global batch size:    32 | lm loss: 6.324138E+00 | loss scale: 16384.0 | grad norm: 77448.416 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3816/  159576 | consumed samples:        74864 | elapsed time per iteration (ms): 14756.0 | learning rate: 2.073E-05 | global batch size:    32 | lm loss: 6.479269E+00 | loss scale: 16384.0 | grad norm: 80928.548 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3817/  159576 | consumed samples:        74896 | elapsed time per iteration (ms): 14631.8 | learning rate: 2.074E-05 | global batch size:    32 | lm loss: 6.448977E+00 | loss scale: 16384.0 | grad norm: 81667.758 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3818/  159576 | consumed samples:        74928 | elapsed time per iteration (ms): 14631.1 | learning rate: 2.075E-05 | global batch size:    32 | lm loss: 6.550106E+00 | loss scale: 16384.0 | grad norm: 65592.243 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3819/  159576 | consumed samples:        74960 | elapsed time per iteration (ms): 14596.0 | learning rate: 2.076E-05 | global batch size:    32 | lm loss: 6.589513E+00 | loss scale: 16384.0 | grad norm: 93606.472 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3820/  159576 | consumed samples:        74992 | elapsed time per iteration (ms): 14800.0 | learning rate: 2.076E-05 | global batch size:    32 | lm loss: 6.472552E+00 | loss scale: 16384.0 | grad norm: 63974.308 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3821/  159576 | consumed samples:        75024 | elapsed time per iteration (ms): 14588.9 | learning rate: 2.077E-05 | global batch size:    32 | lm loss: 6.366886E+00 | loss scale: 16384.0 | grad norm: 87736.372 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3822/  159576 | consumed samples:        75056 | elapsed time per iteration (ms): 14606.9 | learning rate: 2.078E-05 | global batch size:    32 | lm loss: 6.523769E+00 | loss scale: 16384.0 | grad norm: 81803.750 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3823/  159576 | consumed samples:        75088 | elapsed time per iteration (ms): 14588.5 | learning rate: 2.079E-05 | global batch size:    32 | lm loss: 6.495326E+00 | loss scale: 16384.0 | grad norm: 63058.316 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3824/  159576 | consumed samples:        75120 | elapsed time per iteration (ms): 14986.3 | learning rate: 2.080E-05 | global batch size:    32 | lm loss: 6.557096E+00 | loss scale: 16384.0 | grad norm: 80592.237 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3825/  159576 | consumed samples:        75152 | elapsed time per iteration (ms): 14684.5 | learning rate: 2.081E-05 | global batch size:    32 | lm loss: 6.436917E+00 | loss scale: 16384.0 | grad norm: 70196.423 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3826/  159576 | consumed samples:        75184 | elapsed time per iteration (ms): 14627.4 | learning rate: 2.082E-05 | global batch size:    32 | lm loss: 6.413104E+00 | loss scale: 16384.0 | grad norm: 81085.772 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-24 17:07:03] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1165978_[1-10%1] on 'gpu_p13' partition)
[2021-09-24 17:07:03] PULSE: tr8-104B is running for 11:14:52 since 2021-09-24T05:52:11 (1162855_1 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8])
 iteration     3827/  159576 | consumed samples:        75216 | elapsed time per iteration (ms): 14531.9 | learning rate: 2.083E-05 | global batch size:    32 | lm loss: 6.427704E+00 | loss scale: 16384.0 | grad norm: 68943.205 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3828/  159576 | consumed samples:        75248 | elapsed time per iteration (ms): 14988.1 | learning rate: 2.084E-05 | global batch size:    32 | lm loss: 6.347779E+00 | loss scale: 16384.0 | grad norm: 64095.719 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3829/  159576 | consumed samples:        75280 | elapsed time per iteration (ms): 14665.9 | learning rate: 2.084E-05 | global batch size:    32 | lm loss: 6.411919E+00 | loss scale: 16384.0 | grad norm: 82008.163 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3830/  159576 | consumed samples:        75312 | elapsed time per iteration (ms): 14539.9 | learning rate: 2.085E-05 | global batch size:    32 | lm loss: 6.458866E+00 | loss scale: 16384.0 | grad norm: 67971.949 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3831/  159576 | consumed samples:        75344 | elapsed time per iteration (ms): 14600.2 | learning rate: 2.086E-05 | global batch size:    32 | lm loss: 6.450158E+00 | loss scale: 16384.0 | grad norm: 59376.432 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3832/  159576 | consumed samples:        75376 | elapsed time per iteration (ms): 14931.8 | learning rate: 2.087E-05 | global batch size:    32 | lm loss: 6.537256E+00 | loss scale: 16384.0 | grad norm: 77538.560 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3833/  159576 | consumed samples:        75408 | elapsed time per iteration (ms): 14592.6 | learning rate: 2.088E-05 | global batch size:    32 | lm loss: 6.392985E+00 | loss scale: 16384.0 | grad norm: 84275.600 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3834/  159576 | consumed samples:        75440 | elapsed time per iteration (ms): 14616.6 | learning rate: 2.089E-05 | global batch size:    32 | lm loss: 6.512251E+00 | loss scale: 16384.0 | grad norm: 80167.095 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3835/  159576 | consumed samples:        75472 | elapsed time per iteration (ms): 14584.0 | learning rate: 2.090E-05 | global batch size:    32 | lm loss: 6.467295E+00 | loss scale: 16384.0 | grad norm: 85124.328 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3836/  159576 | consumed samples:        75504 | elapsed time per iteration (ms): 14844.3 | learning rate: 2.091E-05 | global batch size:    32 | lm loss: 6.514040E+00 | loss scale: 16384.0 | grad norm: 71539.963 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3837/  159576 | consumed samples:        75536 | elapsed time per iteration (ms): 14618.8 | learning rate: 2.092E-05 | global batch size:    32 | lm loss: 6.519591E+00 | loss scale: 16384.0 | grad norm: 89173.398 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3838/  159576 | consumed samples:        75568 | elapsed time per iteration (ms): 14566.0 | learning rate: 2.092E-05 | global batch size:    32 | lm loss: 6.447284E+00 | loss scale: 16384.0 | grad norm: 86030.395 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3839/  159576 | consumed samples:        75600 | elapsed time per iteration (ms): 14636.3 | learning rate: 2.093E-05 | global batch size:    32 | lm loss: 6.369718E+00 | loss scale: 16384.0 | grad norm: 66275.400 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3840/  159576 | consumed samples:        75632 | elapsed time per iteration (ms): 14897.9 | learning rate: 2.094E-05 | global batch size:    32 | lm loss: 6.467171E+00 | loss scale: 16384.0 | grad norm: 82043.402 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3841/  159576 | consumed samples:        75664 | elapsed time per iteration (ms): 14554.8 | learning rate: 2.095E-05 | global batch size:    32 | lm loss: 6.458669E+00 | loss scale: 16384.0 | grad norm: 73761.762 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3842/  159576 | consumed samples:        75696 | elapsed time per iteration (ms): 14564.2 | learning rate: 2.096E-05 | global batch size:    32 | lm loss: 6.516797E+00 | loss scale: 16384.0 | grad norm: 83647.133 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3843/  159576 | consumed samples:        75728 | elapsed time per iteration (ms): 14464.9 | learning rate: 2.097E-05 | global batch size:    32 | lm loss: 6.381551E+00 | loss scale: 16384.0 | grad norm: 58297.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3844/  159576 | consumed samples:        75760 | elapsed time per iteration (ms): 14942.4 | learning rate: 2.098E-05 | global batch size:    32 | lm loss: 6.471825E+00 | loss scale: 16384.0 | grad norm: 82881.127 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3845/  159576 | consumed samples:        75792 | elapsed time per iteration (ms): 14531.3 | learning rate: 2.099E-05 | global batch size:    32 | lm loss: 6.528457E+00 | loss scale: 16384.0 | grad norm: 67296.172 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3846/  159576 | consumed samples:        75824 | elapsed time per iteration (ms): 14601.9 | learning rate: 2.100E-05 | global batch size:    32 | lm loss: 6.408827E+00 | loss scale: 16384.0 | grad norm: 67512.624 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3847/  159576 | consumed samples:        75856 | elapsed time per iteration (ms): 14580.2 | learning rate: 2.100E-05 | global batch size:    32 | lm loss: 6.440091E+00 | loss scale: 16384.0 | grad norm: 78400.656 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3848/  159576 | consumed samples:        75888 | elapsed time per iteration (ms): 14911.9 | learning rate: 2.101E-05 | global batch size:    32 | lm loss: 6.374573E+00 | loss scale: 16384.0 | grad norm: 85886.969 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3849/  159576 | consumed samples:        75920 | elapsed time per iteration (ms): 14768.3 | learning rate: 2.102E-05 | global batch size:    32 | lm loss: 6.529835E+00 | loss scale: 16384.0 | grad norm: 71394.057 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3850/  159576 | consumed samples:        75952 | elapsed time per iteration (ms): 14553.3 | learning rate: 2.103E-05 | global batch size:    32 | lm loss: 6.455585E+00 | loss scale: 16384.0 | grad norm: 67772.089 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3851/  159576 | consumed samples:        75984 | elapsed time per iteration (ms): 14574.9 | learning rate: 2.104E-05 | global batch size:    32 | lm loss: 6.428284E+00 | loss scale: 16384.0 | grad norm: 110864.098 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3852/  159576 | consumed samples:        76016 | elapsed time per iteration (ms): 14592.6 | learning rate: 2.105E-05 | global batch size:    32 | lm loss: 6.457644E+00 | loss scale: 16384.0 | grad norm: 73499.592 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3853/  159576 | consumed samples:        76048 | elapsed time per iteration (ms): 14780.7 | learning rate: 2.106E-05 | global batch size:    32 | lm loss: 6.459057E+00 | loss scale: 16384.0 | grad norm: 71503.908 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3854/  159576 | consumed samples:        76080 | elapsed time per iteration (ms): 14631.9 | learning rate: 2.107E-05 | global batch size:    32 | lm loss: 6.522111E+00 | loss scale: 16384.0 | grad norm: 73205.829 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3855/  159576 | consumed samples:        76112 | elapsed time per iteration (ms): 14685.7 | learning rate: 2.108E-05 | global batch size:    32 | lm loss: 6.444643E+00 | loss scale: 16384.0 | grad norm: 70169.559 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3856/  159576 | consumed samples:        76144 | elapsed time per iteration (ms): 14534.2 | learning rate: 2.108E-05 | global batch size:    32 | lm loss: 6.392300E+00 | loss scale: 16384.0 | grad norm: 81224.688 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3857/  159576 | consumed samples:        76176 | elapsed time per iteration (ms): 14734.9 | learning rate: 2.109E-05 | global batch size:    32 | lm loss: 6.474737E+00 | loss scale: 16384.0 | grad norm: 76429.789 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3858/  159576 | consumed samples:        76208 | elapsed time per iteration (ms): 14589.1 | learning rate: 2.110E-05 | global batch size:    32 | lm loss: 6.481500E+00 | loss scale: 16384.0 | grad norm: 76288.617 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3859/  159576 | consumed samples:        76240 | elapsed time per iteration (ms): 14536.6 | learning rate: 2.111E-05 | global batch size:    32 | lm loss: 6.504058E+00 | loss scale: 16384.0 | grad norm: 75104.955 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3860/  159576 | consumed samples:        76272 | elapsed time per iteration (ms): 14557.4 | learning rate: 2.112E-05 | global batch size:    32 | lm loss: 6.616935E+00 | loss scale: 16384.0 | grad norm: 73471.312 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3861/  159576 | consumed samples:        76304 | elapsed time per iteration (ms): 14996.3 | learning rate: 2.113E-05 | global batch size:    32 | lm loss: 6.437632E+00 | loss scale: 16384.0 | grad norm: 100626.814 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3862/  159576 | consumed samples:        76336 | elapsed time per iteration (ms): 14610.8 | learning rate: 2.114E-05 | global batch size:    32 | lm loss: 6.358921E+00 | loss scale: 16384.0 | grad norm: 84367.846 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3863/  159576 | consumed samples:        76368 | elapsed time per iteration (ms): 14574.0 | learning rate: 2.115E-05 | global batch size:    32 | lm loss: 6.489450E+00 | loss scale: 16384.0 | grad norm: 111308.083 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3864/  159576 | consumed samples:        76400 | elapsed time per iteration (ms): 14585.8 | learning rate: 2.116E-05 | global batch size:    32 | lm loss: 6.579299E+00 | loss scale: 16384.0 | grad norm: 71685.183 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3865/  159576 | consumed samples:        76432 | elapsed time per iteration (ms): 14801.5 | learning rate: 2.116E-05 | global batch size:    32 | lm loss: 6.356242E+00 | loss scale: 16384.0 | grad norm: 68636.493 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3866/  159576 | consumed samples:        76464 | elapsed time per iteration (ms): 14581.8 | learning rate: 2.117E-05 | global batch size:    32 | lm loss: 6.583051E+00 | loss scale: 16384.0 | grad norm: 83498.983 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3867/  159576 | consumed samples:        76496 | elapsed time per iteration (ms): 14548.1 | learning rate: 2.118E-05 | global batch size:    32 | lm loss: 6.414474E+00 | loss scale: 16384.0 | grad norm: 70120.527 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3868/  159576 | consumed samples:        76528 | elapsed time per iteration (ms): 14581.2 | learning rate: 2.119E-05 | global batch size:    32 | lm loss: 6.383676E+00 | loss scale: 16384.0 | grad norm: 65625.290 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3869/  159576 | consumed samples:        76560 | elapsed time per iteration (ms): 14975.0 | learning rate: 2.120E-05 | global batch size:    32 | lm loss: 6.553302E+00 | loss scale: 16384.0 | grad norm: 78443.319 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3870/  159576 | consumed samples:        76592 | elapsed time per iteration (ms): 14654.1 | learning rate: 2.121E-05 | global batch size:    32 | lm loss: 6.525763E+00 | loss scale: 16384.0 | grad norm: 74575.789 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3871/  159576 | consumed samples:        76624 | elapsed time per iteration (ms): 14658.5 | learning rate: 2.122E-05 | global batch size:    32 | lm loss: 6.416959E+00 | loss scale: 16384.0 | grad norm: 61001.593 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3872/  159576 | consumed samples:        76656 | elapsed time per iteration (ms): 14544.3 | learning rate: 2.123E-05 | global batch size:    32 | lm loss: 6.516649E+00 | loss scale: 16384.0 | grad norm: 76582.538 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3873/  159576 | consumed samples:        76688 | elapsed time per iteration (ms): 14961.2 | learning rate: 2.124E-05 | global batch size:    32 | lm loss: 6.532383E+00 | loss scale: 16384.0 | grad norm: 98540.585 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3874/  159576 | consumed samples:        76720 | elapsed time per iteration (ms): 14595.7 | learning rate: 2.124E-05 | global batch size:    32 | lm loss: 6.589262E+00 | loss scale: 16384.0 | grad norm: 90020.937 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3875/  159576 | consumed samples:        76752 | elapsed time per iteration (ms): 14549.8 | learning rate: 2.125E-05 | global batch size:    32 | lm loss: 6.475612E+00 | loss scale: 16384.0 | grad norm: 71253.795 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3876/  159576 | consumed samples:        76784 | elapsed time per iteration (ms): 14539.7 | learning rate: 2.126E-05 | global batch size:    32 | lm loss: 6.477540E+00 | loss scale: 16384.0 | grad norm: 113904.264 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3877/  159576 | consumed samples:        76816 | elapsed time per iteration (ms): 14922.4 | learning rate: 2.127E-05 | global batch size:    32 | lm loss: 6.475825E+00 | loss scale: 16384.0 | grad norm: 59736.077 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3878/  159576 | consumed samples:        76848 | elapsed time per iteration (ms): 14676.0 | learning rate: 2.128E-05 | global batch size:    32 | lm loss: 6.477038E+00 | loss scale: 16384.0 | grad norm: 73926.427 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3879/  159576 | consumed samples:        76880 | elapsed time per iteration (ms): 14505.4 | learning rate: 2.129E-05 | global batch size:    32 | lm loss: 6.577363E+00 | loss scale: 16384.0 | grad norm: 65273.771 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3880/  159576 | consumed samples:        76912 | elapsed time per iteration (ms): 14525.2 | learning rate: 2.130E-05 | global batch size:    32 | lm loss: 6.431276E+00 | loss scale: 16384.0 | grad norm: 62353.041 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3881/  159576 | consumed samples:        76944 | elapsed time per iteration (ms): 14918.9 | learning rate: 2.131E-05 | global batch size:    32 | lm loss: 6.471975E+00 | loss scale: 16384.0 | grad norm: 80402.399 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3882/  159576 | consumed samples:        76976 | elapsed time per iteration (ms): 14543.5 | learning rate: 2.132E-05 | global batch size:    32 | lm loss: 6.481179E+00 | loss scale: 16384.0 | grad norm: 59241.446 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3883/  159576 | consumed samples:        77008 | elapsed time per iteration (ms): 14519.1 | learning rate: 2.132E-05 | global batch size:    32 | lm loss: 6.356431E+00 | loss scale: 16384.0 | grad norm: 66124.949 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3884/  159576 | consumed samples:        77040 | elapsed time per iteration (ms): 14635.6 | learning rate: 2.133E-05 | global batch size:    32 | lm loss: 7.171796E+00 | loss scale: 16384.0 | grad norm: 628102.297 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3885/  159576 | consumed samples:        77072 | elapsed time per iteration (ms): 14877.6 | learning rate: 2.134E-05 | global batch size:    32 | lm loss: 7.122965E+00 | loss scale: 16384.0 | grad norm: 105361.079 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3886/  159576 | consumed samples:        77104 | elapsed time per iteration (ms): 14581.7 | learning rate: 2.135E-05 | global batch size:    32 | lm loss: 6.781033E+00 | loss scale: 16384.0 | grad norm: 90805.956 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3887/  159576 | consumed samples:        77136 | elapsed time per iteration (ms): 14580.5 | learning rate: 2.136E-05 | global batch size:    32 | lm loss: 6.824611E+00 | loss scale: 16384.0 | grad norm: 128888.283 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3888/  159576 | consumed samples:        77168 | elapsed time per iteration (ms): 14468.4 | learning rate: 2.137E-05 | global batch size:    32 | lm loss: 6.773994E+00 | loss scale: 16384.0 | grad norm: 67441.277 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3889/  159576 | consumed samples:        77200 | elapsed time per iteration (ms): 14934.3 | learning rate: 2.138E-05 | global batch size:    32 | lm loss: 6.845183E+00 | loss scale: 16384.0 | grad norm: 171660.767 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3890/  159576 | consumed samples:        77232 | elapsed time per iteration (ms): 14531.8 | learning rate: 2.139E-05 | global batch size:    32 | lm loss: 6.803124E+00 | loss scale: 16384.0 | grad norm: 100767.890 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3891/  159576 | consumed samples:        77264 | elapsed time per iteration (ms): 14568.7 | learning rate: 2.139E-05 | global batch size:    32 | lm loss: 6.825951E+00 | loss scale: 16384.0 | grad norm: 84326.742 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3892/  159576 | consumed samples:        77296 | elapsed time per iteration (ms): 14543.8 | learning rate: 2.140E-05 | global batch size:    32 | lm loss: 6.734772E+00 | loss scale: 16384.0 | grad norm: 87236.773 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3893/  159576 | consumed samples:        77328 | elapsed time per iteration (ms): 14607.7 | learning rate: 2.141E-05 | global batch size:    32 | lm loss: 6.789660E+00 | loss scale: 16384.0 | grad norm: 88054.207 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3894/  159576 | consumed samples:        77360 | elapsed time per iteration (ms): 14920.9 | learning rate: 2.142E-05 | global batch size:    32 | lm loss: 6.710454E+00 | loss scale: 16384.0 | grad norm: 182978.046 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3895/  159576 | consumed samples:        77392 | elapsed time per iteration (ms): 14510.2 | learning rate: 2.143E-05 | global batch size:    32 | lm loss: 6.691602E+00 | loss scale: 16384.0 | grad norm: 119037.944 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3896/  159576 | consumed samples:        77424 | elapsed time per iteration (ms): 14496.2 | learning rate: 2.144E-05 | global batch size:    32 | lm loss: 6.739342E+00 | loss scale: 16384.0 | grad norm: 97461.502 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3897/  159576 | consumed samples:        77456 | elapsed time per iteration (ms): 14526.7 | learning rate: 2.145E-05 | global batch size:    32 | lm loss: 6.818674E+00 | loss scale: 16384.0 | grad norm: 86334.005 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3898/  159576 | consumed samples:        77488 | elapsed time per iteration (ms): 14792.9 | learning rate: 2.146E-05 | global batch size:    32 | lm loss: 6.717194E+00 | loss scale: 16384.0 | grad norm: 113951.645 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3899/  159576 | consumed samples:        77520 | elapsed time per iteration (ms): 14491.5 | learning rate: 2.147E-05 | global batch size:    32 | lm loss: 6.714782E+00 | loss scale: 16384.0 | grad norm: 99766.959 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3900/  159576 | consumed samples:        77552 | elapsed time per iteration (ms): 14584.1 | learning rate: 2.147E-05 | global batch size:    32 | lm loss: 6.659179E+00 | loss scale: 16384.0 | grad norm: 89663.421 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3901/  159576 | consumed samples:        77584 | elapsed time per iteration (ms): 14629.2 | learning rate: 2.148E-05 | global batch size:    32 | lm loss: 6.615579E+00 | loss scale: 16384.0 | grad norm: 68957.535 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3902/  159576 | consumed samples:        77616 | elapsed time per iteration (ms): 14617.9 | learning rate: 2.149E-05 | global batch size:    32 | lm loss: 6.606854E+00 | loss scale: 16384.0 | grad norm: 99968.600 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3903/  159576 | consumed samples:        77648 | elapsed time per iteration (ms): 14554.1 | learning rate: 2.150E-05 | global batch size:    32 | lm loss: 6.537298E+00 | loss scale: 16384.0 | grad norm: 67921.849 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3904/  159576 | consumed samples:        77680 | elapsed time per iteration (ms): 14545.4 | learning rate: 2.151E-05 | global batch size:    32 | lm loss: 6.606940E+00 | loss scale: 16384.0 | grad norm: 145573.785 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3905/  159576 | consumed samples:        77712 | elapsed time per iteration (ms): 14521.9 | learning rate: 2.152E-05 | global batch size:    32 | lm loss: 6.625298E+00 | loss scale: 16384.0 | grad norm: 96778.059 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3906/  159576 | consumed samples:        77744 | elapsed time per iteration (ms): 14699.2 | learning rate: 2.153E-05 | global batch size:    32 | lm loss: 6.624491E+00 | loss scale: 16384.0 | grad norm: 92738.461 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3907/  159576 | consumed samples:        77776 | elapsed time per iteration (ms): 14558.6 | learning rate: 2.154E-05 | global batch size:    32 | lm loss: 6.825802E+00 | loss scale: 16384.0 | grad norm: 119492.559 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3908/  159576 | consumed samples:        77808 | elapsed time per iteration (ms): 14547.7 | learning rate: 2.155E-05 | global batch size:    32 | lm loss: 6.591653E+00 | loss scale: 16384.0 | grad norm: 78761.796 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3909/  159576 | consumed samples:        77840 | elapsed time per iteration (ms): 14554.0 | learning rate: 2.155E-05 | global batch size:    32 | lm loss: 6.567001E+00 | loss scale: 16384.0 | grad norm: 147075.233 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3910/  159576 | consumed samples:        77872 | elapsed time per iteration (ms): 15013.4 | learning rate: 2.156E-05 | global batch size:    32 | lm loss: 6.787440E+00 | loss scale: 16384.0 | grad norm: 142314.988 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3911/  159576 | consumed samples:        77904 | elapsed time per iteration (ms): 14566.2 | learning rate: 2.157E-05 | global batch size:    32 | lm loss: 6.525432E+00 | loss scale: 16384.0 | grad norm: 87369.307 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3912/  159576 | consumed samples:        77936 | elapsed time per iteration (ms): 14516.0 | learning rate: 2.158E-05 | global batch size:    32 | lm loss: 6.615817E+00 | loss scale: 16384.0 | grad norm: 83904.990 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3913/  159576 | consumed samples:        77968 | elapsed time per iteration (ms): 14525.8 | learning rate: 2.159E-05 | global batch size:    32 | lm loss: 6.564670E+00 | loss scale: 16384.0 | grad norm: 97516.560 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3914/  159576 | consumed samples:        78000 | elapsed time per iteration (ms): 15027.0 | learning rate: 2.160E-05 | global batch size:    32 | lm loss: 6.400544E+00 | loss scale: 16384.0 | grad norm: 92743.388 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3915/  159576 | consumed samples:        78032 | elapsed time per iteration (ms): 14573.6 | learning rate: 2.161E-05 | global batch size:    32 | lm loss: 6.603245E+00 | loss scale: 16384.0 | grad norm: 106541.895 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3916/  159576 | consumed samples:        78064 | elapsed time per iteration (ms): 14538.9 | learning rate: 2.162E-05 | global batch size:    32 | lm loss: 6.560642E+00 | loss scale: 16384.0 | grad norm: 71313.618 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3917/  159576 | consumed samples:        78096 | elapsed time per iteration (ms): 14550.2 | learning rate: 2.163E-05 | global batch size:    32 | lm loss: 6.578140E+00 | loss scale: 16384.0 | grad norm: 83812.809 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3918/  159576 | consumed samples:        78128 | elapsed time per iteration (ms): 14857.6 | learning rate: 2.163E-05 | global batch size:    32 | lm loss: 6.583351E+00 | loss scale: 16384.0 | grad norm: 69616.816 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3919/  159576 | consumed samples:        78160 | elapsed time per iteration (ms): 14509.2 | learning rate: 2.164E-05 | global batch size:    32 | lm loss: 6.595952E+00 | loss scale: 16384.0 | grad norm: 83133.116 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3920/  159576 | consumed samples:        78192 | elapsed time per iteration (ms): 14502.7 | learning rate: 2.165E-05 | global batch size:    32 | lm loss: 6.645111E+00 | loss scale: 16384.0 | grad norm: 69570.909 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3921/  159576 | consumed samples:        78224 | elapsed time per iteration (ms): 14498.8 | learning rate: 2.166E-05 | global batch size:    32 | lm loss: 6.553501E+00 | loss scale: 16384.0 | grad norm: 142896.192 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3922/  159576 | consumed samples:        78256 | elapsed time per iteration (ms): 14842.1 | learning rate: 2.167E-05 | global batch size:    32 | lm loss: 6.687614E+00 | loss scale: 16384.0 | grad norm: 107346.964 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3923/  159576 | consumed samples:        78288 | elapsed time per iteration (ms): 14567.6 | learning rate: 2.168E-05 | global batch size:    32 | lm loss: 6.764112E+00 | loss scale: 16384.0 | grad norm: 75484.388 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3924/  159576 | consumed samples:        78320 | elapsed time per iteration (ms): 14603.6 | learning rate: 2.169E-05 | global batch size:    32 | lm loss: 6.384696E+00 | loss scale: 16384.0 | grad norm: 91570.469 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3925/  159576 | consumed samples:        78352 | elapsed time per iteration (ms): 14494.1 | learning rate: 2.170E-05 | global batch size:    32 | lm loss: 6.148740E+00 | loss scale: 16384.0 | grad norm: 66094.874 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3926/  159576 | consumed samples:        78384 | elapsed time per iteration (ms): 14880.0 | learning rate: 2.171E-05 | global batch size:    32 | lm loss: 6.492467E+00 | loss scale: 16384.0 | grad norm: 95980.364 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3927/  159576 | consumed samples:        78416 | elapsed time per iteration (ms): 14529.0 | learning rate: 2.171E-05 | global batch size:    32 | lm loss: 6.634668E+00 | loss scale: 16384.0 | grad norm: 102240.933 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3928/  159576 | consumed samples:        78448 | elapsed time per iteration (ms): 14524.9 | learning rate: 2.172E-05 | global batch size:    32 | lm loss: 6.542571E+00 | loss scale: 16384.0 | grad norm: 78190.337 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3929/  159576 | consumed samples:        78480 | elapsed time per iteration (ms): 14519.9 | learning rate: 2.173E-05 | global batch size:    32 | lm loss: 6.546354E+00 | loss scale: 16384.0 | grad norm: 69181.655 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3930/  159576 | consumed samples:        78512 | elapsed time per iteration (ms): 14848.7 | learning rate: 2.174E-05 | global batch size:    32 | lm loss: 6.556016E+00 | loss scale: 16384.0 | grad norm: 166890.175 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3931/  159576 | consumed samples:        78544 | elapsed time per iteration (ms): 14630.3 | learning rate: 2.175E-05 | global batch size:    32 | lm loss: 6.575625E+00 | loss scale: 16384.0 | grad norm: 67026.457 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3932/  159576 | consumed samples:        78576 | elapsed time per iteration (ms): 14503.2 | learning rate: 2.176E-05 | global batch size:    32 | lm loss: 6.528583E+00 | loss scale: 16384.0 | grad norm: 65300.446 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3933/  159576 | consumed samples:        78608 | elapsed time per iteration (ms): 14533.6 | learning rate: 2.177E-05 | global batch size:    32 | lm loss: 6.571996E+00 | loss scale: 16384.0 | grad norm: 61530.557 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3934/  159576 | consumed samples:        78640 | elapsed time per iteration (ms): 14528.2 | learning rate: 2.178E-05 | global batch size:    32 | lm loss: 6.524823E+00 | loss scale: 16384.0 | grad norm: 58107.513 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3935/  159576 | consumed samples:        78672 | elapsed time per iteration (ms): 14801.4 | learning rate: 2.179E-05 | global batch size:    32 | lm loss: 6.627916E+00 | loss scale: 16384.0 | grad norm: 64798.821 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3936/  159576 | consumed samples:        78704 | elapsed time per iteration (ms): 14509.3 | learning rate: 2.179E-05 | global batch size:    32 | lm loss: 6.511620E+00 | loss scale: 16384.0 | grad norm: 59258.569 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3937/  159576 | consumed samples:        78736 | elapsed time per iteration (ms): 14529.7 | learning rate: 2.180E-05 | global batch size:    32 | lm loss: 6.414696E+00 | loss scale: 16384.0 | grad norm: 75598.973 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3938/  159576 | consumed samples:        78768 | elapsed time per iteration (ms): 14568.6 | learning rate: 2.181E-05 | global batch size:    32 | lm loss: 6.692476E+00 | loss scale: 16384.0 | grad norm: 68594.644 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3939/  159576 | consumed samples:        78800 | elapsed time per iteration (ms): 14680.0 | learning rate: 2.182E-05 | global batch size:    32 | lm loss: 6.509182E+00 | loss scale: 16384.0 | grad norm: 77431.860 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3940/  159576 | consumed samples:        78832 | elapsed time per iteration (ms): 14561.3 | learning rate: 2.183E-05 | global batch size:    32 | lm loss: 6.521114E+00 | loss scale: 16384.0 | grad norm: 67107.459 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3941/  159576 | consumed samples:        78864 | elapsed time per iteration (ms): 14540.3 | learning rate: 2.184E-05 | global batch size:    32 | lm loss: 6.557777E+00 | loss scale: 16384.0 | grad norm: 82252.980 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3942/  159576 | consumed samples:        78896 | elapsed time per iteration (ms): 14516.4 | learning rate: 2.185E-05 | global batch size:    32 | lm loss: 6.519272E+00 | loss scale: 16384.0 | grad norm: 62956.678 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3943/  159576 | consumed samples:        78928 | elapsed time per iteration (ms): 14804.0 | learning rate: 2.186E-05 | global batch size:    32 | lm loss: 6.436077E+00 | loss scale: 16384.0 | grad norm: 63372.650 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3944/  159576 | consumed samples:        78960 | elapsed time per iteration (ms): 14504.5 | learning rate: 2.187E-05 | global batch size:    32 | lm loss: 6.536609E+00 | loss scale: 16384.0 | grad norm: 70623.314 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3945/  159576 | consumed samples:        78992 | elapsed time per iteration (ms): 14519.8 | learning rate: 2.187E-05 | global batch size:    32 | lm loss: 6.631818E+00 | loss scale: 16384.0 | grad norm: 62267.463 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3946/  159576 | consumed samples:        79024 | elapsed time per iteration (ms): 14592.1 | learning rate: 2.188E-05 | global batch size:    32 | lm loss: 6.263665E+00 | loss scale: 16384.0 | grad norm: 67107.842 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3947/  159576 | consumed samples:        79056 | elapsed time per iteration (ms): 14791.6 | learning rate: 2.189E-05 | global batch size:    32 | lm loss: 6.622372E+00 | loss scale: 16384.0 | grad norm: 84764.799 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3948/  159576 | consumed samples:        79088 | elapsed time per iteration (ms): 14637.3 | learning rate: 2.190E-05 | global batch size:    32 | lm loss: 6.395759E+00 | loss scale: 16384.0 | grad norm: 60113.545 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3949/  159576 | consumed samples:        79120 | elapsed time per iteration (ms): 14546.6 | learning rate: 2.191E-05 | global batch size:    32 | lm loss: 6.588756E+00 | loss scale: 16384.0 | grad norm: 68679.133 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3950/  159576 | consumed samples:        79152 | elapsed time per iteration (ms): 14514.6 | learning rate: 2.192E-05 | global batch size:    32 | lm loss: 6.484011E+00 | loss scale: 16384.0 | grad norm: 68729.821 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3951/  159576 | consumed samples:        79184 | elapsed time per iteration (ms): 14907.8 | learning rate: 2.193E-05 | global batch size:    32 | lm loss: 6.496289E+00 | loss scale: 16384.0 | grad norm: 58918.789 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3952/  159576 | consumed samples:        79216 | elapsed time per iteration (ms): 14467.7 | learning rate: 2.194E-05 | global batch size:    32 | lm loss: 6.442475E+00 | loss scale: 16384.0 | grad norm: 73240.452 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3953/  159576 | consumed samples:        79248 | elapsed time per iteration (ms): 14613.3 | learning rate: 2.195E-05 | global batch size:    32 | lm loss: 6.412640E+00 | loss scale: 16384.0 | grad norm: 63495.861 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3954/  159576 | consumed samples:        79280 | elapsed time per iteration (ms): 14497.1 | learning rate: 2.195E-05 | global batch size:    32 | lm loss: 6.419092E+00 | loss scale: 16384.0 | grad norm: 64832.581 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3955/  159576 | consumed samples:        79312 | elapsed time per iteration (ms): 14864.8 | learning rate: 2.196E-05 | global batch size:    32 | lm loss: 6.411493E+00 | loss scale: 16384.0 | grad norm: 70227.738 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3956/  159576 | consumed samples:        79344 | elapsed time per iteration (ms): 14501.1 | learning rate: 2.197E-05 | global batch size:    32 | lm loss: 6.377773E+00 | loss scale: 16384.0 | grad norm: 65521.131 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3957/  159576 | consumed samples:        79376 | elapsed time per iteration (ms): 14522.7 | learning rate: 2.198E-05 | global batch size:    32 | lm loss: 6.458980E+00 | loss scale: 16384.0 | grad norm: 62294.197 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3958/  159576 | consumed samples:        79408 | elapsed time per iteration (ms): 14509.2 | learning rate: 2.199E-05 | global batch size:    32 | lm loss: 6.540348E+00 | loss scale: 16384.0 | grad norm: 64994.102 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3959/  159576 | consumed samples:        79440 | elapsed time per iteration (ms): 14868.7 | learning rate: 2.200E-05 | global batch size:    32 | lm loss: 6.503858E+00 | loss scale: 16384.0 | grad norm: 54271.909 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3960/  159576 | consumed samples:        79472 | elapsed time per iteration (ms): 14512.5 | learning rate: 2.201E-05 | global batch size:    32 | lm loss: 6.372645E+00 | loss scale: 16384.0 | grad norm: 73237.307 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3961/  159576 | consumed samples:        79504 | elapsed time per iteration (ms): 14552.3 | learning rate: 2.202E-05 | global batch size:    32 | lm loss: 6.396554E+00 | loss scale: 16384.0 | grad norm: 64579.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3962/  159576 | consumed samples:        79536 | elapsed time per iteration (ms): 14559.3 | learning rate: 2.203E-05 | global batch size:    32 | lm loss: 6.556979E+00 | loss scale: 16384.0 | grad norm: 83489.476 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3963/  159576 | consumed samples:        79568 | elapsed time per iteration (ms): 14899.9 | learning rate: 2.203E-05 | global batch size:    32 | lm loss: 6.458327E+00 | loss scale: 16384.0 | grad norm: 58716.823 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3964/  159576 | consumed samples:        79600 | elapsed time per iteration (ms): 14539.5 | learning rate: 2.204E-05 | global batch size:    32 | lm loss: 6.802517E+00 | loss scale: 16384.0 | grad norm: 60731.153 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3965/  159576 | consumed samples:        79632 | elapsed time per iteration (ms): 14520.1 | learning rate: 2.205E-05 | global batch size:    32 | lm loss: 6.616902E+00 | loss scale: 16384.0 | grad norm: 64155.719 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3966/  159576 | consumed samples:        79664 | elapsed time per iteration (ms): 14585.2 | learning rate: 2.206E-05 | global batch size:    32 | lm loss: 6.457995E+00 | loss scale: 16384.0 | grad norm: 74880.971 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3967/  159576 | consumed samples:        79696 | elapsed time per iteration (ms): 14850.0 | learning rate: 2.207E-05 | global batch size:    32 | lm loss: 6.591904E+00 | loss scale: 16384.0 | grad norm: 75336.614 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3968/  159576 | consumed samples:        79728 | elapsed time per iteration (ms): 14661.7 | learning rate: 2.208E-05 | global batch size:    32 | lm loss: 6.475752E+00 | loss scale: 16384.0 | grad norm: 76852.677 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3969/  159576 | consumed samples:        79760 | elapsed time per iteration (ms): 14523.7 | learning rate: 2.209E-05 | global batch size:    32 | lm loss: 6.452621E+00 | loss scale: 16384.0 | grad norm: 65844.475 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3970/  159576 | consumed samples:        79792 | elapsed time per iteration (ms): 14549.1 | learning rate: 2.210E-05 | global batch size:    32 | lm loss: 6.401618E+00 | loss scale: 16384.0 | grad norm: 84954.581 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3971/  159576 | consumed samples:        79824 | elapsed time per iteration (ms): 14508.8 | learning rate: 2.211E-05 | global batch size:    32 | lm loss: 6.516178E+00 | loss scale: 16384.0 | grad norm: 71111.037 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3972/  159576 | consumed samples:        79856 | elapsed time per iteration (ms): 14847.5 | learning rate: 2.211E-05 | global batch size:    32 | lm loss: 6.601567E+00 | loss scale: 16384.0 | grad norm: 74563.765 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3973/  159576 | consumed samples:        79888 | elapsed time per iteration (ms): 14594.0 | learning rate: 2.212E-05 | global batch size:    32 | lm loss: 6.441951E+00 | loss scale: 16384.0 | grad norm: 72653.525 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3974/  159576 | consumed samples:        79920 | elapsed time per iteration (ms): 14478.4 | learning rate: 2.213E-05 | global batch size:    32 | lm loss: 6.510294E+00 | loss scale: 16384.0 | grad norm: 65083.374 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3975/  159576 | consumed samples:        79952 | elapsed time per iteration (ms): 14520.1 | learning rate: 2.214E-05 | global batch size:    32 | lm loss: 6.345959E+00 | loss scale: 16384.0 | grad norm: 133600.019 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3976/  159576 | consumed samples:        79984 | elapsed time per iteration (ms): 14770.3 | learning rate: 2.215E-05 | global batch size:    32 | lm loss: 6.477483E+00 | loss scale: 16384.0 | grad norm: 89443.795 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3977/  159576 | consumed samples:        80016 | elapsed time per iteration (ms): 14483.7 | learning rate: 2.216E-05 | global batch size:    32 | lm loss: 6.466526E+00 | loss scale: 16384.0 | grad norm: 79203.283 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3978/  159576 | consumed samples:        80048 | elapsed time per iteration (ms): 14548.9 | learning rate: 2.217E-05 | global batch size:    32 | lm loss: 6.490917E+00 | loss scale: 16384.0 | grad norm: 85035.254 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3979/  159576 | consumed samples:        80080 | elapsed time per iteration (ms): 14519.8 | learning rate: 2.218E-05 | global batch size:    32 | lm loss: 6.412145E+00 | loss scale: 16384.0 | grad norm: 93580.388 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3980/  159576 | consumed samples:        80112 | elapsed time per iteration (ms): 14659.7 | learning rate: 2.218E-05 | global batch size:    32 | lm loss: 6.473646E+00 | loss scale: 16384.0 | grad norm: 79422.522 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3981/  159576 | consumed samples:        80144 | elapsed time per iteration (ms): 14525.1 | learning rate: 2.219E-05 | global batch size:    32 | lm loss: 6.522334E+00 | loss scale: 16384.0 | grad norm: 83533.865 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3982/  159576 | consumed samples:        80176 | elapsed time per iteration (ms): 14543.1 | learning rate: 2.220E-05 | global batch size:    32 | lm loss: 6.387228E+00 | loss scale: 16384.0 | grad norm: 89795.957 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3983/  159576 | consumed samples:        80208 | elapsed time per iteration (ms): 14609.8 | learning rate: 2.221E-05 | global batch size:    32 | lm loss: 6.475267E+00 | loss scale: 16384.0 | grad norm: 119598.589 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3984/  159576 | consumed samples:        80240 | elapsed time per iteration (ms): 14596.2 | learning rate: 2.222E-05 | global batch size:    32 | lm loss: 6.533351E+00 | loss scale: 16384.0 | grad norm: 72306.036 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3985/  159576 | consumed samples:        80272 | elapsed time per iteration (ms): 14621.5 | learning rate: 2.223E-05 | global batch size:    32 | lm loss: 6.540237E+00 | loss scale: 16384.0 | grad norm: 88358.505 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3986/  159576 | consumed samples:        80304 | elapsed time per iteration (ms): 14563.8 | learning rate: 2.224E-05 | global batch size:    32 | lm loss: 6.419699E+00 | loss scale: 16384.0 | grad norm: 75411.849 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3987/  159576 | consumed samples:        80336 | elapsed time per iteration (ms): 14555.9 | learning rate: 2.225E-05 | global batch size:    32 | lm loss: 6.591748E+00 | loss scale: 16384.0 | grad norm: 112139.715 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3988/  159576 | consumed samples:        80368 | elapsed time per iteration (ms): 15004.4 | learning rate: 2.226E-05 | global batch size:    32 | lm loss: 6.551664E+00 | loss scale: 16384.0 | grad norm: 88397.931 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3989/  159576 | consumed samples:        80400 | elapsed time per iteration (ms): 14610.9 | learning rate: 2.226E-05 | global batch size:    32 | lm loss: 6.531049E+00 | loss scale: 16384.0 | grad norm: 63924.116 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3990/  159576 | consumed samples:        80432 | elapsed time per iteration (ms): 14532.5 | learning rate: 2.227E-05 | global batch size:    32 | lm loss: 6.546918E+00 | loss scale: 16384.0 | grad norm: 97299.376 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3991/  159576 | consumed samples:        80464 | elapsed time per iteration (ms): 14437.4 | learning rate: 2.228E-05 | global batch size:    32 | lm loss: 6.471569E+00 | loss scale: 16384.0 | grad norm: 76326.402 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3992/  159576 | consumed samples:        80496 | elapsed time per iteration (ms): 14906.8 | learning rate: 2.229E-05 | global batch size:    32 | lm loss: 6.525407E+00 | loss scale: 16384.0 | grad norm: 77183.511 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3993/  159576 | consumed samples:        80528 | elapsed time per iteration (ms): 14534.2 | learning rate: 2.230E-05 | global batch size:    32 | lm loss: 6.539597E+00 | loss scale: 16384.0 | grad norm: 60376.571 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3994/  159576 | consumed samples:        80560 | elapsed time per iteration (ms): 14579.3 | learning rate: 2.231E-05 | global batch size:    32 | lm loss: 6.552666E+00 | loss scale: 16384.0 | grad norm: 84746.386 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3995/  159576 | consumed samples:        80592 | elapsed time per iteration (ms): 14529.3 | learning rate: 2.232E-05 | global batch size:    32 | lm loss: 6.413946E+00 | loss scale: 16384.0 | grad norm: 67969.641 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3996/  159576 | consumed samples:        80624 | elapsed time per iteration (ms): 14922.8 | learning rate: 2.233E-05 | global batch size:    32 | lm loss: 6.427969E+00 | loss scale: 16384.0 | grad norm: 69649.626 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3997/  159576 | consumed samples:        80656 | elapsed time per iteration (ms): 14490.2 | learning rate: 2.234E-05 | global batch size:    32 | lm loss: 6.527985E+00 | loss scale: 16384.0 | grad norm: 71644.429 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3998/  159576 | consumed samples:        80688 | elapsed time per iteration (ms): 14523.9 | learning rate: 2.234E-05 | global batch size:    32 | lm loss: 6.565372E+00 | loss scale: 16384.0 | grad norm: 75823.635 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3999/  159576 | consumed samples:        80720 | elapsed time per iteration (ms): 14567.0 | learning rate: 2.235E-05 | global batch size:    32 | lm loss: 6.377391E+00 | loss scale: 16384.0 | grad norm: 85967.590 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-24 17:49:17,238] [INFO] [logging.py:68:log_dist] [Rank 0] step=4000, skipped=6, lr=[2.2362426035502958e-05, 2.2362426035502958e-05], mom=[(0.9, 0.999), (0.9, 0.999)]
steps: 4000 loss: 6.3572 iter time (s): 0.007 samples/sec: 4556.985
 iteration     4000/  159576 | consumed samples:        80752 | elapsed time per iteration (ms): 14858.7 | learning rate: 2.236E-05 | global batch size:    32 | lm loss: 6.357164E+00 | loss scale: 16384.0 | grad norm: 56878.207 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
------------------------------------------------------------------------------------------------
 validation loss at iteration 4000 | lm loss value: 6.453290E+00 | lm loss PPL: 6.347876E+02 | 
------------------------------------------------------------------------------------------------
 iteration     4001/  159576 | consumed samples:        80784 | elapsed time per iteration (ms): 20796.3 | learning rate: 2.237E-05 | global batch size:    32 | lm loss: 6.357805E+00 | loss scale: 16384.0 | grad norm: 75271.208 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4002/  159576 | consumed samples:        80816 | elapsed time per iteration (ms): 14528.3 | learning rate: 2.238E-05 | global batch size:    32 | lm loss: 6.590372E+00 | loss scale: 16384.0 | grad norm: 82823.216 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4003/  159576 | consumed samples:        80848 | elapsed time per iteration (ms): 14569.0 | learning rate: 2.239E-05 | global batch size:    32 | lm loss: 6.547601E+00 | loss scale: 16384.0 | grad norm: 63495.848 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4004/  159576 | consumed samples:        80880 | elapsed time per iteration (ms): 14981.7 | learning rate: 2.240E-05 | global batch size:    32 | lm loss: 6.488581E+00 | loss scale: 16384.0 | grad norm: 84538.823 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4005/  159576 | consumed samples:        80912 | elapsed time per iteration (ms): 14517.6 | learning rate: 2.241E-05 | global batch size:    32 | lm loss: 6.473035E+00 | loss scale: 16384.0 | grad norm: 69154.929 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4006/  159576 | consumed samples:        80944 | elapsed time per iteration (ms): 14515.3 | learning rate: 2.242E-05 | global batch size:    32 | lm loss: 6.574604E+00 | loss scale: 16384.0 | grad norm: 71258.786 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4007/  159576 | consumed samples:        80976 | elapsed time per iteration (ms): 14530.3 | learning rate: 2.242E-05 | global batch size:    32 | lm loss: 6.480978E+00 | loss scale: 16384.0 | grad norm: 63598.555 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4008/  159576 | consumed samples:        81008 | elapsed time per iteration (ms): 15052.4 | learning rate: 2.243E-05 | global batch size:    32 | lm loss: 6.393389E+00 | loss scale: 16384.0 | grad norm: 76474.916 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4009/  159576 | consumed samples:        81040 | elapsed time per iteration (ms): 14618.9 | learning rate: 2.244E-05 | global batch size:    32 | lm loss: 6.322450E+00 | loss scale: 16384.0 | grad norm: 62736.146 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4010/  159576 | consumed samples:        81072 | elapsed time per iteration (ms): 14521.7 | learning rate: 2.245E-05 | global batch size:    32 | lm loss: 6.502364E+00 | loss scale: 16384.0 | grad norm: 78751.861 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4011/  159576 | consumed samples:        81104 | elapsed time per iteration (ms): 14513.4 | learning rate: 2.246E-05 | global batch size:    32 | lm loss: 6.504915E+00 | loss scale: 16384.0 | grad norm: 73290.420 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4012/  159576 | consumed samples:        81136 | elapsed time per iteration (ms): 14859.5 | learning rate: 2.247E-05 | global batch size:    32 | lm loss: 6.422670E+00 | loss scale: 16384.0 | grad norm: 70911.343 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4013/  159576 | consumed samples:        81168 | elapsed time per iteration (ms): 14562.7 | learning rate: 2.248E-05 | global batch size:    32 | lm loss: 6.460926E+00 | loss scale: 16384.0 | grad norm: 88361.679 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4014/  159576 | consumed samples:        81200 | elapsed time per iteration (ms): 14537.6 | learning rate: 2.249E-05 | global batch size:    32 | lm loss: 6.359708E+00 | loss scale: 16384.0 | grad norm: 70950.803 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4015/  159576 | consumed samples:        81232 | elapsed time per iteration (ms): 14575.5 | learning rate: 2.250E-05 | global batch size:    32 | lm loss: 6.479752E+00 | loss scale: 16384.0 | grad norm: 60916.908 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4016/  159576 | consumed samples:        81264 | elapsed time per iteration (ms): 14890.4 | learning rate: 2.250E-05 | global batch size:    32 | lm loss: 6.438080E+00 | loss scale: 16384.0 | grad norm: 78503.860 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4017/  159576 | consumed samples:        81296 | elapsed time per iteration (ms): 14519.4 | learning rate: 2.251E-05 | global batch size:    32 | lm loss: 6.446492E+00 | loss scale: 16384.0 | grad norm: 66299.320 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4018/  159576 | consumed samples:        81328 | elapsed time per iteration (ms): 14512.9 | learning rate: 2.252E-05 | global batch size:    32 | lm loss: 6.418320E+00 | loss scale: 16384.0 | grad norm: 65936.043 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4019/  159576 | consumed samples:        81360 | elapsed time per iteration (ms): 14568.1 | learning rate: 2.253E-05 | global batch size:    32 | lm loss: 6.337445E+00 | loss scale: 16384.0 | grad norm: 71727.512 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4020/  159576 | consumed samples:        81392 | elapsed time per iteration (ms): 14867.3 | learning rate: 2.254E-05 | global batch size:    32 | lm loss: 6.564549E+00 | loss scale: 16384.0 | grad norm: 96122.107 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4021/  159576 | consumed samples:        81424 | elapsed time per iteration (ms): 14435.4 | learning rate: 2.255E-05 | global batch size:    32 | lm loss: 6.485852E+00 | loss scale: 16384.0 | grad norm: 82597.736 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4022/  159576 | consumed samples:        81456 | elapsed time per iteration (ms): 14558.0 | learning rate: 2.256E-05 | global batch size:    32 | lm loss: 6.539099E+00 | loss scale: 16384.0 | grad norm: 121006.289 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4023/  159576 | consumed samples:        81488 | elapsed time per iteration (ms): 14530.8 | learning rate: 2.257E-05 | global batch size:    32 | lm loss: 6.588836E+00 | loss scale: 16384.0 | grad norm: 83990.530 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4024/  159576 | consumed samples:        81520 | elapsed time per iteration (ms): 14903.1 | learning rate: 2.258E-05 | global batch size:    32 | lm loss: 6.478038E+00 | loss scale: 16384.0 | grad norm: 86310.728 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4025/  159576 | consumed samples:        81552 | elapsed time per iteration (ms): 14640.8 | learning rate: 2.258E-05 | global batch size:    32 | lm loss: 6.423618E+00 | loss scale: 16384.0 | grad norm: 72646.553 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4026/  159576 | consumed samples:        81584 | elapsed time per iteration (ms): 14523.1 | learning rate: 2.259E-05 | global batch size:    32 | lm loss: 6.389876E+00 | loss scale: 16384.0 | grad norm: 75260.682 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4027/  159576 | consumed samples:        81616 | elapsed time per iteration (ms): 14495.3 | learning rate: 2.260E-05 | global batch size:    32 | lm loss: 6.686980E+00 | loss scale: 16384.0 | grad norm: 68901.893 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4028/  159576 | consumed samples:        81648 | elapsed time per iteration (ms): 14518.7 | learning rate: 2.261E-05 | global batch size:    32 | lm loss: 6.454273E+00 | loss scale: 16384.0 | grad norm: 78058.506 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4029/  159576 | consumed samples:        81680 | elapsed time per iteration (ms): 14751.7 | learning rate: 2.262E-05 | global batch size:    32 | lm loss: 6.645922E+00 | loss scale: 16384.0 | grad norm: 90877.563 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4030/  159576 | consumed samples:        81712 | elapsed time per iteration (ms): 14605.8 | learning rate: 2.263E-05 | global batch size:    32 | lm loss: 6.554152E+00 | loss scale: 16384.0 | grad norm: 71333.048 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4031/  159576 | consumed samples:        81744 | elapsed time per iteration (ms): 14567.0 | learning rate: 2.264E-05 | global batch size:    32 | lm loss: 6.512757E+00 | loss scale: 16384.0 | grad norm: 75409.197 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4032/  159576 | consumed samples:        81776 | elapsed time per iteration (ms): 14627.7 | learning rate: 2.265E-05 | global batch size:    32 | lm loss: 6.529600E+00 | loss scale: 16384.0 | grad norm: 83852.632 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4033/  159576 | consumed samples:        81808 | elapsed time per iteration (ms): 14706.7 | learning rate: 2.266E-05 | global batch size:    32 | lm loss: 6.312231E+00 | loss scale: 16384.0 | grad norm: 64610.818 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4034/  159576 | consumed samples:        81840 | elapsed time per iteration (ms): 14453.1 | learning rate: 2.266E-05 | global batch size:    32 | lm loss: 6.378237E+00 | loss scale: 16384.0 | grad norm: 70363.183 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4035/  159576 | consumed samples:        81872 | elapsed time per iteration (ms): 14558.4 | learning rate: 2.267E-05 | global batch size:    32 | lm loss: 6.617406E+00 | loss scale: 16384.0 | grad norm: 76776.869 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4036/  159576 | consumed samples:        81904 | elapsed time per iteration (ms): 14451.4 | learning rate: 2.268E-05 | global batch size:    32 | lm loss: 6.510260E+00 | loss scale: 16384.0 | grad norm: 65763.594 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4037/  159576 | consumed samples:        81936 | elapsed time per iteration (ms): 14734.4 | learning rate: 2.269E-05 | global batch size:    32 | lm loss: 6.484540E+00 | loss scale: 16384.0 | grad norm: 113964.842 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4038/  159576 | consumed samples:        81968 | elapsed time per iteration (ms): 14560.9 | learning rate: 2.270E-05 | global batch size:    32 | lm loss: 6.422564E+00 | loss scale: 16384.0 | grad norm: 71196.418 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4039/  159576 | consumed samples:        82000 | elapsed time per iteration (ms): 14521.4 | learning rate: 2.271E-05 | global batch size:    32 | lm loss: 6.468810E+00 | loss scale: 16384.0 | grad norm: 81464.635 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4040/  159576 | consumed samples:        82032 | elapsed time per iteration (ms): 14534.9 | learning rate: 2.272E-05 | global batch size:    32 | lm loss: 6.528829E+00 | loss scale: 16384.0 | grad norm: 64883.399 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4041/  159576 | consumed samples:        82064 | elapsed time per iteration (ms): 14840.7 | learning rate: 2.273E-05 | global batch size:    32 | lm loss: 6.466451E+00 | loss scale: 16384.0 | grad norm: 113319.594 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4042/  159576 | consumed samples:        82096 | elapsed time per iteration (ms): 14627.3 | learning rate: 2.274E-05 | global batch size:    32 | lm loss: 6.455089E+00 | loss scale: 16384.0 | grad norm: 63704.855 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4043/  159576 | consumed samples:        82128 | elapsed time per iteration (ms): 14401.0 | learning rate: 2.274E-05 | global batch size:    32 | lm loss: 6.394213E+00 | loss scale: 16384.0 | grad norm: 104510.525 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4044/  159576 | consumed samples:        82160 | elapsed time per iteration (ms): 14522.2 | learning rate: 2.275E-05 | global batch size:    32 | lm loss: 6.436733E+00 | loss scale: 16384.0 | grad norm: 69916.210 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4045/  159576 | consumed samples:        82192 | elapsed time per iteration (ms): 14878.3 | learning rate: 2.276E-05 | global batch size:    32 | lm loss: 6.467334E+00 | loss scale: 16384.0 | grad norm: 86814.439 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4046/  159576 | consumed samples:        82224 | elapsed time per iteration (ms): 14619.5 | learning rate: 2.277E-05 | global batch size:    32 | lm loss: 6.542828E+00 | loss scale: 16384.0 | grad norm: 91169.836 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4047/  159576 | consumed samples:        82256 | elapsed time per iteration (ms): 14546.0 | learning rate: 2.278E-05 | global batch size:    32 | lm loss: 6.482902E+00 | loss scale: 16384.0 | grad norm: 71855.514 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4048/  159576 | consumed samples:        82288 | elapsed time per iteration (ms): 14535.3 | learning rate: 2.279E-05 | global batch size:    32 | lm loss: 6.380974E+00 | loss scale: 16384.0 | grad norm: 110448.433 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4049/  159576 | consumed samples:        82320 | elapsed time per iteration (ms): 14946.7 | learning rate: 2.280E-05 | global batch size:    32 | lm loss: 6.604033E+00 | loss scale: 16384.0 | grad norm: 86973.778 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4050/  159576 | consumed samples:        82352 | elapsed time per iteration (ms): 14452.3 | learning rate: 2.281E-05 | global batch size:    32 | lm loss: 6.485418E+00 | loss scale: 16384.0 | grad norm: 93547.929 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4051/  159576 | consumed samples:        82384 | elapsed time per iteration (ms): 14486.7 | learning rate: 2.282E-05 | global batch size:    32 | lm loss: 6.447795E+00 | loss scale: 16384.0 | grad norm: 71623.174 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4052/  159576 | consumed samples:        82416 | elapsed time per iteration (ms): 14546.0 | learning rate: 2.282E-05 | global batch size:    32 | lm loss: 6.490433E+00 | loss scale: 16384.0 | grad norm: 122748.723 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4053/  159576 | consumed samples:        82448 | elapsed time per iteration (ms): 14923.8 | learning rate: 2.283E-05 | global batch size:    32 | lm loss: 6.393107E+00 | loss scale: 16384.0 | grad norm: 94716.038 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4054/  159576 | consumed samples:        82480 | elapsed time per iteration (ms): 14522.3 | learning rate: 2.284E-05 | global batch size:    32 | lm loss: 6.560749E+00 | loss scale: 16384.0 | grad norm: 87911.375 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4055/  159576 | consumed samples:        82512 | elapsed time per iteration (ms): 14576.1 | learning rate: 2.285E-05 | global batch size:    32 | lm loss: 6.508199E+00 | loss scale: 16384.0 | grad norm: 75712.942 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4056/  159576 | consumed samples:        82544 | elapsed time per iteration (ms): 14509.2 | learning rate: 2.286E-05 | global batch size:    32 | lm loss: 6.480619E+00 | loss scale: 16384.0 | grad norm: 92968.738 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4057/  159576 | consumed samples:        82576 | elapsed time per iteration (ms): 14814.4 | learning rate: 2.287E-05 | global batch size:    32 | lm loss: 6.324226E+00 | loss scale: 16384.0 | grad norm: 78472.900 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4058/  159576 | consumed samples:        82608 | elapsed time per iteration (ms): 14459.3 | learning rate: 2.288E-05 | global batch size:    32 | lm loss: 6.626959E+00 | loss scale: 16384.0 | grad norm: 80531.732 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4059/  159576 | consumed samples:        82640 | elapsed time per iteration (ms): 14496.4 | learning rate: 2.289E-05 | global batch size:    32 | lm loss: 6.406682E+00 | loss scale: 16384.0 | grad norm: 75308.856 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4060/  159576 | consumed samples:        82672 | elapsed time per iteration (ms): 14562.2 | learning rate: 2.289E-05 | global batch size:    32 | lm loss: 6.440542E+00 | loss scale: 16384.0 | grad norm: 78114.884 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4061/  159576 | consumed samples:        82704 | elapsed time per iteration (ms): 14796.0 | learning rate: 2.290E-05 | global batch size:    32 | lm loss: 6.468933E+00 | loss scale: 16384.0 | grad norm: 77154.286 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4062/  159576 | consumed samples:        82736 | elapsed time per iteration (ms): 14696.5 | learning rate: 2.291E-05 | global batch size:    32 | lm loss: 6.318196E+00 | loss scale: 16384.0 | grad norm: 97551.121 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4063/  159576 | consumed samples:        82768 | elapsed time per iteration (ms): 14468.1 | learning rate: 2.292E-05 | global batch size:    32 | lm loss: 6.472930E+00 | loss scale: 16384.0 | grad norm: 110041.778 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4064/  159576 | consumed samples:        82800 | elapsed time per iteration (ms): 14496.2 | learning rate: 2.293E-05 | global batch size:    32 | lm loss: 6.523721E+00 | loss scale: 16384.0 | grad norm: 88018.768 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4065/  159576 | consumed samples:        82832 | elapsed time per iteration (ms): 14563.8 | learning rate: 2.294E-05 | global batch size:    32 | lm loss: 6.453180E+00 | loss scale: 16384.0 | grad norm: 83087.922 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4066/  159576 | consumed samples:        82864 | elapsed time per iteration (ms): 14884.4 | learning rate: 2.295E-05 | global batch size:    32 | lm loss: 6.447326E+00 | loss scale: 16384.0 | grad norm: 72433.599 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4067/  159576 | consumed samples:        82896 | elapsed time per iteration (ms): 14491.5 | learning rate: 2.296E-05 | global batch size:    32 | lm loss: 6.366633E+00 | loss scale: 16384.0 | grad norm: 100504.434 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4068/  159576 | consumed samples:        82928 | elapsed time per iteration (ms): 14561.6 | learning rate: 2.297E-05 | global batch size:    32 | lm loss: 6.315294E+00 | loss scale: 16384.0 | grad norm: 79868.630 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4069/  159576 | consumed samples:        82960 | elapsed time per iteration (ms): 14538.6 | learning rate: 2.297E-05 | global batch size:    32 | lm loss: 6.452709E+00 | loss scale: 16384.0 | grad norm: 94073.963 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4070/  159576 | consumed samples:        82992 | elapsed time per iteration (ms): 14651.1 | learning rate: 2.298E-05 | global batch size:    32 | lm loss: 6.421084E+00 | loss scale: 16384.0 | grad norm: 96558.906 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4071/  159576 | consumed samples:        83024 | elapsed time per iteration (ms): 14508.0 | learning rate: 2.299E-05 | global batch size:    32 | lm loss: 6.474918E+00 | loss scale: 16384.0 | grad norm: 104437.791 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4072/  159576 | consumed samples:        83056 | elapsed time per iteration (ms): 14540.3 | learning rate: 2.300E-05 | global batch size:    32 | lm loss: 6.442264E+00 | loss scale: 16384.0 | grad norm: 69985.883 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-24 18:07:07] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1165978_[1-10%1] on 'gpu_p13' partition)
[2021-09-24 18:07:07] PULSE: tr8-104B is running for 12:14:56 since 2021-09-24T05:52:11 (1162855_1 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8])
 iteration     4073/  159576 | consumed samples:        83088 | elapsed time per iteration (ms): 14430.9 | learning rate: 2.301E-05 | global batch size:    32 | lm loss: 6.464416E+00 | loss scale: 16384.0 | grad norm: 92935.764 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4074/  159576 | consumed samples:        83120 | elapsed time per iteration (ms): 14595.5 | learning rate: 2.302E-05 | global batch size:    32 | lm loss: 6.394172E+00 | loss scale: 16384.0 | grad norm: 93727.497 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4075/  159576 | consumed samples:        83152 | elapsed time per iteration (ms): 14478.6 | learning rate: 2.303E-05 | global batch size:    32 | lm loss: 6.535138E+00 | loss scale: 16384.0 | grad norm: 110910.133 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4076/  159576 | consumed samples:        83184 | elapsed time per iteration (ms): 14559.7 | learning rate: 2.304E-05 | global batch size:    32 | lm loss: 6.459756E+00 | loss scale: 16384.0 | grad norm: 79798.141 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4077/  159576 | consumed samples:        83216 | elapsed time per iteration (ms): 14529.0 | learning rate: 2.305E-05 | global batch size:    32 | lm loss: 6.388766E+00 | loss scale: 16384.0 | grad norm: 80153.289 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4078/  159576 | consumed samples:        83248 | elapsed time per iteration (ms): 15028.3 | learning rate: 2.305E-05 | global batch size:    32 | lm loss: 6.462305E+00 | loss scale: 16384.0 | grad norm: 72541.670 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4079/  159576 | consumed samples:        83280 | elapsed time per iteration (ms): 14501.7 | learning rate: 2.306E-05 | global batch size:    32 | lm loss: 6.606649E+00 | loss scale: 16384.0 | grad norm: 72682.132 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4080/  159576 | consumed samples:        83312 | elapsed time per iteration (ms): 14478.7 | learning rate: 2.307E-05 | global batch size:    32 | lm loss: 6.339183E+00 | loss scale: 16384.0 | grad norm: 77952.104 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4081/  159576 | consumed samples:        83344 | elapsed time per iteration (ms): 14534.3 | learning rate: 2.308E-05 | global batch size:    32 | lm loss: 6.482682E+00 | loss scale: 16384.0 | grad norm: 78541.532 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4082/  159576 | consumed samples:        83376 | elapsed time per iteration (ms): 14971.6 | learning rate: 2.309E-05 | global batch size:    32 | lm loss: 6.464870E+00 | loss scale: 16384.0 | grad norm: 82812.736 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4083/  159576 | consumed samples:        83408 | elapsed time per iteration (ms): 14619.1 | learning rate: 2.310E-05 | global batch size:    32 | lm loss: 6.468065E+00 | loss scale: 16384.0 | grad norm: 95549.999 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4084/  159576 | consumed samples:        83440 | elapsed time per iteration (ms): 14580.8 | learning rate: 2.311E-05 | global batch size:    32 | lm loss: 6.390970E+00 | loss scale: 16384.0 | grad norm: 76775.584 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4085/  159576 | consumed samples:        83472 | elapsed time per iteration (ms): 14597.4 | learning rate: 2.312E-05 | global batch size:    32 | lm loss: 6.441597E+00 | loss scale: 16384.0 | grad norm: 87885.418 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4086/  159576 | consumed samples:        83504 | elapsed time per iteration (ms): 14827.9 | learning rate: 2.313E-05 | global batch size:    32 | lm loss: 6.332308E+00 | loss scale: 16384.0 | grad norm: 67530.324 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4087/  159576 | consumed samples:        83536 | elapsed time per iteration (ms): 14496.3 | learning rate: 2.313E-05 | global batch size:    32 | lm loss: 6.360069E+00 | loss scale: 16384.0 | grad norm: 65277.636 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4088/  159576 | consumed samples:        83568 | elapsed time per iteration (ms): 14505.1 | learning rate: 2.314E-05 | global batch size:    32 | lm loss: 6.331870E+00 | loss scale: 16384.0 | grad norm: 73276.808 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4089/  159576 | consumed samples:        83600 | elapsed time per iteration (ms): 14518.3 | learning rate: 2.315E-05 | global batch size:    32 | lm loss: 6.279953E+00 | loss scale: 16384.0 | grad norm: 69193.657 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4090/  159576 | consumed samples:        83632 | elapsed time per iteration (ms): 14816.9 | learning rate: 2.316E-05 | global batch size:    32 | lm loss: 6.473932E+00 | loss scale: 16384.0 | grad norm: 78838.749 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4091/  159576 | consumed samples:        83664 | elapsed time per iteration (ms): 14589.1 | learning rate: 2.317E-05 | global batch size:    32 | lm loss: 6.346605E+00 | loss scale: 16384.0 | grad norm: 76401.273 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4092/  159576 | consumed samples:        83696 | elapsed time per iteration (ms): 14611.5 | learning rate: 2.318E-05 | global batch size:    32 | lm loss: 6.444325E+00 | loss scale: 16384.0 | grad norm: 85411.693 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4093/  159576 | consumed samples:        83728 | elapsed time per iteration (ms): 14540.2 | learning rate: 2.319E-05 | global batch size:    32 | lm loss: 6.498468E+00 | loss scale: 16384.0 | grad norm: 97013.865 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4094/  159576 | consumed samples:        83760 | elapsed time per iteration (ms): 14934.5 | learning rate: 2.320E-05 | global batch size:    32 | lm loss: 6.368524E+00 | loss scale: 16384.0 | grad norm: 75310.345 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4095/  159576 | consumed samples:        83792 | elapsed time per iteration (ms): 14479.4 | learning rate: 2.321E-05 | global batch size:    32 | lm loss: 6.445729E+00 | loss scale: 16384.0 | grad norm: 79666.296 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4096/  159576 | consumed samples:        83824 | elapsed time per iteration (ms): 14539.3 | learning rate: 2.321E-05 | global batch size:    32 | lm loss: 6.478226E+00 | loss scale: 16384.0 | grad norm: 74953.641 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4097/  159576 | consumed samples:        83856 | elapsed time per iteration (ms): 14544.9 | learning rate: 2.322E-05 | global batch size:    32 | lm loss: 6.494800E+00 | loss scale: 16384.0 | grad norm: 83444.792 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4098/  159576 | consumed samples:        83888 | elapsed time per iteration (ms): 14987.3 | learning rate: 2.323E-05 | global batch size:    32 | lm loss: 6.549989E+00 | loss scale: 16384.0 | grad norm: 73065.290 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4099/  159576 | consumed samples:        83920 | elapsed time per iteration (ms): 14510.7 | learning rate: 2.324E-05 | global batch size:    32 | lm loss: 6.523539E+00 | loss scale: 16384.0 | grad norm: 83625.749 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4100/  159576 | consumed samples:        83952 | elapsed time per iteration (ms): 14610.5 | learning rate: 2.325E-05 | global batch size:    32 | lm loss: 6.451036E+00 | loss scale: 16384.0 | grad norm: 74563.493 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4101/  159576 | consumed samples:        83984 | elapsed time per iteration (ms): 14604.4 | learning rate: 2.326E-05 | global batch size:    32 | lm loss: 6.472479E+00 | loss scale: 16384.0 | grad norm: 109783.349 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4102/  159576 | consumed samples:        84016 | elapsed time per iteration (ms): 14804.2 | learning rate: 2.327E-05 | global batch size:    32 | lm loss: 6.392324E+00 | loss scale: 16384.0 | grad norm: 77708.767 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4103/  159576 | consumed samples:        84048 | elapsed time per iteration (ms): 14666.7 | learning rate: 2.328E-05 | global batch size:    32 | lm loss: 6.388014E+00 | loss scale: 16384.0 | grad norm: 72228.276 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4104/  159576 | consumed samples:        84080 | elapsed time per iteration (ms): 14567.0 | learning rate: 2.329E-05 | global batch size:    32 | lm loss: 6.351237E+00 | loss scale: 16384.0 | grad norm: 75762.926 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4105/  159576 | consumed samples:        84112 | elapsed time per iteration (ms): 14512.3 | learning rate: 2.329E-05 | global batch size:    32 | lm loss: 6.445687E+00 | loss scale: 16384.0 | grad norm: 71985.473 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4106/  159576 | consumed samples:        84144 | elapsed time per iteration (ms): 14555.0 | learning rate: 2.330E-05 | global batch size:    32 | lm loss: 6.450569E+00 | loss scale: 16384.0 | grad norm: 70873.734 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4107/  159576 | consumed samples:        84176 | elapsed time per iteration (ms): 14836.4 | learning rate: 2.331E-05 | global batch size:    32 | lm loss: 6.490268E+00 | loss scale: 16384.0 | grad norm: 62324.382 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4108/  159576 | consumed samples:        84208 | elapsed time per iteration (ms): 14607.5 | learning rate: 2.332E-05 | global batch size:    32 | lm loss: 6.503112E+00 | loss scale: 16384.0 | grad norm: 80147.648 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4109/  159576 | consumed samples:        84240 | elapsed time per iteration (ms): 14516.1 | learning rate: 2.333E-05 | global batch size:    32 | lm loss: 6.575756E+00 | loss scale: 16384.0 | grad norm: 85277.958 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4110/  159576 | consumed samples:        84272 | elapsed time per iteration (ms): 14534.3 | learning rate: 2.334E-05 | global batch size:    32 | lm loss: 6.521991E+00 | loss scale: 16384.0 | grad norm: 88147.911 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4111/  159576 | consumed samples:        84304 | elapsed time per iteration (ms): 14643.4 | learning rate: 2.335E-05 | global batch size:    32 | lm loss: 6.583647E+00 | loss scale: 16384.0 | grad norm: 90470.119 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4112/  159576 | consumed samples:        84336 | elapsed time per iteration (ms): 14501.6 | learning rate: 2.336E-05 | global batch size:    32 | lm loss: 6.307788E+00 | loss scale: 16384.0 | grad norm: 84679.029 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4113/  159576 | consumed samples:        84368 | elapsed time per iteration (ms): 14565.5 | learning rate: 2.337E-05 | global batch size:    32 | lm loss: 6.392709E+00 | loss scale: 16384.0 | grad norm: 85222.050 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4114/  159576 | consumed samples:        84400 | elapsed time per iteration (ms): 14580.4 | learning rate: 2.337E-05 | global batch size:    32 | lm loss: 6.384982E+00 | loss scale: 16384.0 | grad norm: 101932.152 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4115/  159576 | consumed samples:        84432 | elapsed time per iteration (ms): 14793.7 | learning rate: 2.338E-05 | global batch size:    32 | lm loss: 6.402984E+00 | loss scale: 16384.0 | grad norm: 80725.201 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4116/  159576 | consumed samples:        84464 | elapsed time per iteration (ms): 14599.8 | learning rate: 2.339E-05 | global batch size:    32 | lm loss: 6.431032E+00 | loss scale: 16384.0 | grad norm: 88365.957 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4117/  159576 | consumed samples:        84496 | elapsed time per iteration (ms): 14529.0 | learning rate: 2.340E-05 | global batch size:    32 | lm loss: 6.544386E+00 | loss scale: 16384.0 | grad norm: 94647.177 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4118/  159576 | consumed samples:        84528 | elapsed time per iteration (ms): 14520.8 | learning rate: 2.341E-05 | global batch size:    32 | lm loss: 6.494756E+00 | loss scale: 16384.0 | grad norm: 127914.247 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4119/  159576 | consumed samples:        84560 | elapsed time per iteration (ms): 14810.4 | learning rate: 2.342E-05 | global batch size:    32 | lm loss: 6.676927E+00 | loss scale: 16384.0 | grad norm: 255152.408 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4120/  159576 | consumed samples:        84592 | elapsed time per iteration (ms): 14553.6 | learning rate: 2.343E-05 | global batch size:    32 | lm loss: 6.521421E+00 | loss scale: 16384.0 | grad norm: 88738.154 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4121/  159576 | consumed samples:        84624 | elapsed time per iteration (ms): 14615.1 | learning rate: 2.344E-05 | global batch size:    32 | lm loss: 6.422895E+00 | loss scale: 16384.0 | grad norm: 69394.344 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4122/  159576 | consumed samples:        84656 | elapsed time per iteration (ms): 14526.7 | learning rate: 2.345E-05 | global batch size:    32 | lm loss: 6.391778E+00 | loss scale: 16384.0 | grad norm: 75006.078 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4123/  159576 | consumed samples:        84688 | elapsed time per iteration (ms): 14981.6 | learning rate: 2.345E-05 | global batch size:    32 | lm loss: 6.569616E+00 | loss scale: 16384.0 | grad norm: 89357.812 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4124/  159576 | consumed samples:        84720 | elapsed time per iteration (ms): 14751.3 | learning rate: 2.346E-05 | global batch size:    32 | lm loss: 6.522147E+00 | loss scale: 16384.0 | grad norm: 83006.179 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4125/  159576 | consumed samples:        84752 | elapsed time per iteration (ms): 14464.7 | learning rate: 2.347E-05 | global batch size:    32 | lm loss: 6.443343E+00 | loss scale: 16384.0 | grad norm: 85692.827 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4126/  159576 | consumed samples:        84784 | elapsed time per iteration (ms): 14544.8 | learning rate: 2.348E-05 | global batch size:    32 | lm loss: 6.447396E+00 | loss scale: 16384.0 | grad norm: 75026.495 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4127/  159576 | consumed samples:        84816 | elapsed time per iteration (ms): 14837.3 | learning rate: 2.349E-05 | global batch size:    32 | lm loss: 6.407457E+00 | loss scale: 16384.0 | grad norm: 68031.438 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4128/  159576 | consumed samples:        84848 | elapsed time per iteration (ms): 14497.8 | learning rate: 2.350E-05 | global batch size:    32 | lm loss: 6.509037E+00 | loss scale: 16384.0 | grad norm: 81823.377 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4129/  159576 | consumed samples:        84880 | elapsed time per iteration (ms): 14560.1 | learning rate: 2.351E-05 | global batch size:    32 | lm loss: 6.349816E+00 | loss scale: 16384.0 | grad norm: 72346.540 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4130/  159576 | consumed samples:        84912 | elapsed time per iteration (ms): 14548.5 | learning rate: 2.352E-05 | global batch size:    32 | lm loss: 6.479569E+00 | loss scale: 16384.0 | grad norm: 87336.645 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4131/  159576 | consumed samples:        84944 | elapsed time per iteration (ms): 14910.1 | learning rate: 2.353E-05 | global batch size:    32 | lm loss: 6.617517E+00 | loss scale: 16384.0 | grad norm: 86374.633 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4132/  159576 | consumed samples:        84976 | elapsed time per iteration (ms): 14494.2 | learning rate: 2.353E-05 | global batch size:    32 | lm loss: 6.465295E+00 | loss scale: 16384.0 | grad norm: 84022.738 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4133/  159576 | consumed samples:        85008 | elapsed time per iteration (ms): 14507.6 | learning rate: 2.354E-05 | global batch size:    32 | lm loss: 6.496157E+00 | loss scale: 16384.0 | grad norm: 84787.804 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4134/  159576 | consumed samples:        85040 | elapsed time per iteration (ms): 14524.7 | learning rate: 2.355E-05 | global batch size:    32 | lm loss: 6.413724E+00 | loss scale: 16384.0 | grad norm: 85852.526 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4135/  159576 | consumed samples:        85072 | elapsed time per iteration (ms): 14838.8 | learning rate: 2.356E-05 | global batch size:    32 | lm loss: 6.625166E+00 | loss scale: 16384.0 | grad norm: 94635.595 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4136/  159576 | consumed samples:        85104 | elapsed time per iteration (ms): 14542.4 | learning rate: 2.357E-05 | global batch size:    32 | lm loss: 6.407034E+00 | loss scale: 16384.0 | grad norm: 84861.680 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4137/  159576 | consumed samples:        85136 | elapsed time per iteration (ms): 14613.1 | learning rate: 2.358E-05 | global batch size:    32 | lm loss: 6.522691E+00 | loss scale: 16384.0 | grad norm: 90819.589 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4138/  159576 | consumed samples:        85168 | elapsed time per iteration (ms): 14588.1 | learning rate: 2.359E-05 | global batch size:    32 | lm loss: 6.515704E+00 | loss scale: 16384.0 | grad norm: 84641.662 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4139/  159576 | consumed samples:        85200 | elapsed time per iteration (ms): 14775.7 | learning rate: 2.360E-05 | global batch size:    32 | lm loss: 6.462790E+00 | loss scale: 16384.0 | grad norm: 109335.190 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4140/  159576 | consumed samples:        85232 | elapsed time per iteration (ms): 14632.9 | learning rate: 2.361E-05 | global batch size:    32 | lm loss: 6.565165E+00 | loss scale: 16384.0 | grad norm: 101408.740 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4141/  159576 | consumed samples:        85264 | elapsed time per iteration (ms): 14488.2 | learning rate: 2.361E-05 | global batch size:    32 | lm loss: 6.378877E+00 | loss scale: 16384.0 | grad norm: 85177.703 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4142/  159576 | consumed samples:        85296 | elapsed time per iteration (ms): 14538.0 | learning rate: 2.362E-05 | global batch size:    32 | lm loss: 6.464640E+00 | loss scale: 16384.0 | grad norm: 107413.633 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4143/  159576 | consumed samples:        85328 | elapsed time per iteration (ms): 14656.2 | learning rate: 2.363E-05 | global batch size:    32 | lm loss: 6.672103E+00 | loss scale: 16384.0 | grad norm: 79187.829 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4144/  159576 | consumed samples:        85360 | elapsed time per iteration (ms): 14916.7 | learning rate: 2.364E-05 | global batch size:    32 | lm loss: 6.691429E+00 | loss scale: 16384.0 | grad norm: 105292.440 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4145/  159576 | consumed samples:        85392 | elapsed time per iteration (ms): 14496.1 | learning rate: 2.365E-05 | global batch size:    32 | lm loss: 6.428411E+00 | loss scale: 16384.0 | grad norm: 81232.205 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4146/  159576 | consumed samples:        85424 | elapsed time per iteration (ms): 14532.5 | learning rate: 2.366E-05 | global batch size:    32 | lm loss: 6.483904E+00 | loss scale: 16384.0 | grad norm: 117143.742 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4147/  159576 | consumed samples:        85456 | elapsed time per iteration (ms): 14531.1 | learning rate: 2.367E-05 | global batch size:    32 | lm loss: 6.363456E+00 | loss scale: 16384.0 | grad norm: 88860.011 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4148/  159576 | consumed samples:        85488 | elapsed time per iteration (ms): 14766.7 | learning rate: 2.368E-05 | global batch size:    32 | lm loss: 6.523079E+00 | loss scale: 16384.0 | grad norm: 87677.210 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4149/  159576 | consumed samples:        85520 | elapsed time per iteration (ms): 14507.2 | learning rate: 2.368E-05 | global batch size:    32 | lm loss: 6.553520E+00 | loss scale: 16384.0 | grad norm: 121742.594 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4150/  159576 | consumed samples:        85552 | elapsed time per iteration (ms): 14548.6 | learning rate: 2.369E-05 | global batch size:    32 | lm loss: 6.490498E+00 | loss scale: 16384.0 | grad norm: 89599.956 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4151/  159576 | consumed samples:        85584 | elapsed time per iteration (ms): 14535.8 | learning rate: 2.370E-05 | global batch size:    32 | lm loss: 6.498284E+00 | loss scale: 16384.0 | grad norm: 103857.489 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4152/  159576 | consumed samples:        85616 | elapsed time per iteration (ms): 14637.7 | learning rate: 2.371E-05 | global batch size:    32 | lm loss: 6.607250E+00 | loss scale: 16384.0 | grad norm: 80792.955 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4153/  159576 | consumed samples:        85648 | elapsed time per iteration (ms): 14584.8 | learning rate: 2.372E-05 | global batch size:    32 | lm loss: 6.465719E+00 | loss scale: 16384.0 | grad norm: 76852.004 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4154/  159576 | consumed samples:        85680 | elapsed time per iteration (ms): 14575.3 | learning rate: 2.373E-05 | global batch size:    32 | lm loss: 6.475266E+00 | loss scale: 16384.0 | grad norm: 87775.649 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4155/  159576 | consumed samples:        85712 | elapsed time per iteration (ms): 14452.5 | learning rate: 2.374E-05 | global batch size:    32 | lm loss: 6.456027E+00 | loss scale: 16384.0 | grad norm: 75377.279 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4156/  159576 | consumed samples:        85744 | elapsed time per iteration (ms): 14769.4 | learning rate: 2.375E-05 | global batch size:    32 | lm loss: 6.436621E+00 | loss scale: 16384.0 | grad norm: 86270.120 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4157/  159576 | consumed samples:        85776 | elapsed time per iteration (ms): 14484.6 | learning rate: 2.376E-05 | global batch size:    32 | lm loss: 6.502521E+00 | loss scale: 16384.0 | grad norm: 77291.631 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4158/  159576 | consumed samples:        85808 | elapsed time per iteration (ms): 14605.4 | learning rate: 2.376E-05 | global batch size:    32 | lm loss: 6.271915E+00 | loss scale: 16384.0 | grad norm: 79782.510 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4159/  159576 | consumed samples:        85840 | elapsed time per iteration (ms): 14468.5 | learning rate: 2.377E-05 | global batch size:    32 | lm loss: 6.375775E+00 | loss scale: 16384.0 | grad norm: 91679.045 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4160/  159576 | consumed samples:        85872 | elapsed time per iteration (ms): 15055.2 | learning rate: 2.378E-05 | global batch size:    32 | lm loss: 6.207356E+00 | loss scale: 16384.0 | grad norm: 84700.576 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4161/  159576 | consumed samples:        85904 | elapsed time per iteration (ms): 14639.9 | learning rate: 2.379E-05 | global batch size:    32 | lm loss: 6.385208E+00 | loss scale: 16384.0 | grad norm: 77383.793 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4162/  159576 | consumed samples:        85936 | elapsed time per iteration (ms): 14461.5 | learning rate: 2.380E-05 | global batch size:    32 | lm loss: 6.480938E+00 | loss scale: 16384.0 | grad norm: 98154.860 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4163/  159576 | consumed samples:        85968 | elapsed time per iteration (ms): 14557.2 | learning rate: 2.381E-05 | global batch size:    32 | lm loss: 6.427241E+00 | loss scale: 16384.0 | grad norm: 79663.274 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4164/  159576 | consumed samples:        86000 | elapsed time per iteration (ms): 15046.3 | learning rate: 2.382E-05 | global batch size:    32 | lm loss: 6.310709E+00 | loss scale: 16384.0 | grad norm: 76469.866 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4165/  159576 | consumed samples:        86032 | elapsed time per iteration (ms): 14517.1 | learning rate: 2.383E-05 | global batch size:    32 | lm loss: 6.597423E+00 | loss scale: 16384.0 | grad norm: 95179.205 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4166/  159576 | consumed samples:        86064 | elapsed time per iteration (ms): 14562.4 | learning rate: 2.384E-05 | global batch size:    32 | lm loss: 6.398317E+00 | loss scale: 16384.0 | grad norm: 86889.280 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4167/  159576 | consumed samples:        86096 | elapsed time per iteration (ms): 14577.1 | learning rate: 2.384E-05 | global batch size:    32 | lm loss: 6.447660E+00 | loss scale: 16384.0 | grad norm: 99510.529 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4168/  159576 | consumed samples:        86128 | elapsed time per iteration (ms): 14813.0 | learning rate: 2.385E-05 | global batch size:    32 | lm loss: 6.528482E+00 | loss scale: 16384.0 | grad norm: 83413.223 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4169/  159576 | consumed samples:        86160 | elapsed time per iteration (ms): 14589.9 | learning rate: 2.386E-05 | global batch size:    32 | lm loss: 6.388697E+00 | loss scale: 16384.0 | grad norm: 76722.933 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4170/  159576 | consumed samples:        86192 | elapsed time per iteration (ms): 14519.5 | learning rate: 2.387E-05 | global batch size:    32 | lm loss: 6.446240E+00 | loss scale: 16384.0 | grad norm: 85947.294 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4171/  159576 | consumed samples:        86224 | elapsed time per iteration (ms): 14524.6 | learning rate: 2.388E-05 | global batch size:    32 | lm loss: 6.425363E+00 | loss scale: 16384.0 | grad norm: 88474.007 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4172/  159576 | consumed samples:        86256 | elapsed time per iteration (ms): 14879.2 | learning rate: 2.389E-05 | global batch size:    32 | lm loss: 6.515138E+00 | loss scale: 16384.0 | grad norm: 108134.568 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4173/  159576 | consumed samples:        86288 | elapsed time per iteration (ms): 14582.3 | learning rate: 2.390E-05 | global batch size:    32 | lm loss: 6.533965E+00 | loss scale: 16384.0 | grad norm: 76749.086 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4174/  159576 | consumed samples:        86320 | elapsed time per iteration (ms): 14543.3 | learning rate: 2.391E-05 | global batch size:    32 | lm loss: 6.448212E+00 | loss scale: 16384.0 | grad norm: 93972.310 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4175/  159576 | consumed samples:        86352 | elapsed time per iteration (ms): 14572.0 | learning rate: 2.392E-05 | global batch size:    32 | lm loss: 6.440217E+00 | loss scale: 16384.0 | grad norm: 102291.612 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4176/  159576 | consumed samples:        86384 | elapsed time per iteration (ms): 14897.3 | learning rate: 2.392E-05 | global batch size:    32 | lm loss: 6.324600E+00 | loss scale: 16384.0 | grad norm: 81057.900 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4177/  159576 | consumed samples:        86416 | elapsed time per iteration (ms): 14575.9 | learning rate: 2.393E-05 | global batch size:    32 | lm loss: 6.564878E+00 | loss scale: 16384.0 | grad norm: 96270.150 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4178/  159576 | consumed samples:        86448 | elapsed time per iteration (ms): 14585.7 | learning rate: 2.394E-05 | global batch size:    32 | lm loss: 6.473108E+00 | loss scale: 16384.0 | grad norm: 80498.059 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4179/  159576 | consumed samples:        86480 | elapsed time per iteration (ms): 14517.6 | learning rate: 2.395E-05 | global batch size:    32 | lm loss: 6.519761E+00 | loss scale: 16384.0 | grad norm: 90509.323 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4180/  159576 | consumed samples:        86512 | elapsed time per iteration (ms): 14895.7 | learning rate: 2.396E-05 | global batch size:    32 | lm loss: 6.377243E+00 | loss scale: 16384.0 | grad norm: 92370.262 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4181/  159576 | consumed samples:        86544 | elapsed time per iteration (ms): 14690.0 | learning rate: 2.397E-05 | global batch size:    32 | lm loss: 6.469300E+00 | loss scale: 16384.0 | grad norm: 89492.362 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4182/  159576 | consumed samples:        86576 | elapsed time per iteration (ms): 14557.6 | learning rate: 2.398E-05 | global batch size:    32 | lm loss: 6.497668E+00 | loss scale: 16384.0 | grad norm: 104899.693 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4183/  159576 | consumed samples:        86608 | elapsed time per iteration (ms): 14588.2 | learning rate: 2.399E-05 | global batch size:    32 | lm loss: 6.412446E+00 | loss scale: 16384.0 | grad norm: 81267.948 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4184/  159576 | consumed samples:        86640 | elapsed time per iteration (ms): 14486.7 | learning rate: 2.400E-05 | global batch size:    32 | lm loss: 6.486274E+00 | loss scale: 16384.0 | grad norm: 95404.434 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4185/  159576 | consumed samples:        86672 | elapsed time per iteration (ms): 14942.6 | learning rate: 2.400E-05 | global batch size:    32 | lm loss: 6.375100E+00 | loss scale: 16384.0 | grad norm: 82372.004 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4186/  159576 | consumed samples:        86704 | elapsed time per iteration (ms): 14540.4 | learning rate: 2.401E-05 | global batch size:    32 | lm loss: 6.444688E+00 | loss scale: 16384.0 | grad norm: 102268.468 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4187/  159576 | consumed samples:        86736 | elapsed time per iteration (ms): 14530.9 | learning rate: 2.402E-05 | global batch size:    32 | lm loss: 6.270885E+00 | loss scale: 16384.0 | grad norm: 85114.431 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4188/  159576 | consumed samples:        86768 | elapsed time per iteration (ms): 14554.4 | learning rate: 2.403E-05 | global batch size:    32 | lm loss: 6.461191E+00 | loss scale: 16384.0 | grad norm: 82795.343 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4189/  159576 | consumed samples:        86800 | elapsed time per iteration (ms): 14680.7 | learning rate: 2.404E-05 | global batch size:    32 | lm loss: 6.483377E+00 | loss scale: 16384.0 | grad norm: 106142.212 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4190/  159576 | consumed samples:        86832 | elapsed time per iteration (ms): 14652.1 | learning rate: 2.405E-05 | global batch size:    32 | lm loss: 6.468819E+00 | loss scale: 16384.0 | grad norm: 83557.244 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4191/  159576 | consumed samples:        86864 | elapsed time per iteration (ms): 14459.3 | learning rate: 2.406E-05 | global batch size:    32 | lm loss: 6.379012E+00 | loss scale: 16384.0 | grad norm: 90619.727 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4192/  159576 | consumed samples:        86896 | elapsed time per iteration (ms): 14539.1 | learning rate: 2.407E-05 | global batch size:    32 | lm loss: 6.459314E+00 | loss scale: 16384.0 | grad norm: 94282.455 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4193/  159576 | consumed samples:        86928 | elapsed time per iteration (ms): 14715.7 | learning rate: 2.408E-05 | global batch size:    32 | lm loss: 6.435170E+00 | loss scale: 16384.0 | grad norm: 92946.885 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4194/  159576 | consumed samples:        86960 | elapsed time per iteration (ms): 14501.7 | learning rate: 2.408E-05 | global batch size:    32 | lm loss: 6.419791E+00 | loss scale: 16384.0 | grad norm: 78251.108 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4195/  159576 | consumed samples:        86992 | elapsed time per iteration (ms): 14523.0 | learning rate: 2.409E-05 | global batch size:    32 | lm loss: 6.342591E+00 | loss scale: 16384.0 | grad norm: 80571.454 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4196/  159576 | consumed samples:        87024 | elapsed time per iteration (ms): 14595.3 | learning rate: 2.410E-05 | global batch size:    32 | lm loss: 6.373145E+00 | loss scale: 16384.0 | grad norm: 106409.932 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4197/  159576 | consumed samples:        87056 | elapsed time per iteration (ms): 14737.5 | learning rate: 2.411E-05 | global batch size:    32 | lm loss: 6.543087E+00 | loss scale: 16384.0 | grad norm: 81359.049 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4198/  159576 | consumed samples:        87088 | elapsed time per iteration (ms): 14570.3 | learning rate: 2.412E-05 | global batch size:    32 | lm loss: 6.555972E+00 | loss scale: 16384.0 | grad norm: 101442.652 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4199/  159576 | consumed samples:        87120 | elapsed time per iteration (ms): 14518.0 | learning rate: 2.413E-05 | global batch size:    32 | lm loss: 6.497987E+00 | loss scale: 16384.0 | grad norm: 87789.780 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4200/  159576 | consumed samples:        87152 | elapsed time per iteration (ms): 14561.0 | learning rate: 2.414E-05 | global batch size:    32 | lm loss: 6.526636E+00 | loss scale: 16384.0 | grad norm: 97375.608 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4201/  159576 | consumed samples:        87184 | elapsed time per iteration (ms): 14967.8 | learning rate: 2.415E-05 | global batch size:    32 | lm loss: 6.529594E+00 | loss scale: 16384.0 | grad norm: 98056.606 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4202/  159576 | consumed samples:        87216 | elapsed time per iteration (ms): 14591.5 | learning rate: 2.416E-05 | global batch size:    32 | lm loss: 6.461559E+00 | loss scale: 16384.0 | grad norm: 103248.801 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4203/  159576 | consumed samples:        87248 | elapsed time per iteration (ms): 14557.3 | learning rate: 2.416E-05 | global batch size:    32 | lm loss: 6.255905E+00 | loss scale: 16384.0 | grad norm: 98489.984 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4204/  159576 | consumed samples:        87280 | elapsed time per iteration (ms): 14539.8 | learning rate: 2.417E-05 | global batch size:    32 | lm loss: 6.456792E+00 | loss scale: 16384.0 | grad norm: 90220.601 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4205/  159576 | consumed samples:        87312 | elapsed time per iteration (ms): 14936.2 | learning rate: 2.418E-05 | global batch size:    32 | lm loss: 6.456956E+00 | loss scale: 16384.0 | grad norm: 99591.028 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4206/  159576 | consumed samples:        87344 | elapsed time per iteration (ms): 14602.1 | learning rate: 2.419E-05 | global batch size:    32 | lm loss: 6.539675E+00 | loss scale: 16384.0 | grad norm: 106461.971 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4207/  159576 | consumed samples:        87376 | elapsed time per iteration (ms): 14518.5 | learning rate: 2.420E-05 | global batch size:    32 | lm loss: 6.581583E+00 | loss scale: 16384.0 | grad norm: 104474.944 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4208/  159576 | consumed samples:        87408 | elapsed time per iteration (ms): 14546.2 | learning rate: 2.421E-05 | global batch size:    32 | lm loss: 6.470299E+00 | loss scale: 16384.0 | grad norm: 103936.744 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4209/  159576 | consumed samples:        87440 | elapsed time per iteration (ms): 14895.0 | learning rate: 2.422E-05 | global batch size:    32 | lm loss: 6.485046E+00 | loss scale: 16384.0 | grad norm: 103480.479 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4210/  159576 | consumed samples:        87472 | elapsed time per iteration (ms): 14490.7 | learning rate: 2.423E-05 | global batch size:    32 | lm loss: 6.331614E+00 | loss scale: 16384.0 | grad norm: 92393.675 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4211/  159576 | consumed samples:        87504 | elapsed time per iteration (ms): 14505.6 | learning rate: 2.424E-05 | global batch size:    32 | lm loss: 6.343493E+00 | loss scale: 16384.0 | grad norm: 138840.853 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4212/  159576 | consumed samples:        87536 | elapsed time per iteration (ms): 14559.8 | learning rate: 2.424E-05 | global batch size:    32 | lm loss: 6.362164E+00 | loss scale: 16384.0 | grad norm: 105314.560 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4213/  159576 | consumed samples:        87568 | elapsed time per iteration (ms): 14962.7 | learning rate: 2.425E-05 | global batch size:    32 | lm loss: 6.413978E+00 | loss scale: 16384.0 | grad norm: 100396.214 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4214/  159576 | consumed samples:        87600 | elapsed time per iteration (ms): 14459.8 | learning rate: 2.426E-05 | global batch size:    32 | lm loss: 6.333343E+00 | loss scale: 16384.0 | grad norm: 101809.236 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4215/  159576 | consumed samples:        87632 | elapsed time per iteration (ms): 14541.9 | learning rate: 2.427E-05 | global batch size:    32 | lm loss: 6.552740E+00 | loss scale: 16384.0 | grad norm: 198031.215 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4216/  159576 | consumed samples:        87664 | elapsed time per iteration (ms): 14546.7 | learning rate: 2.428E-05 | global batch size:    32 | lm loss: 6.373903E+00 | loss scale: 16384.0 | grad norm: 98034.031 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4217/  159576 | consumed samples:        87696 | elapsed time per iteration (ms): 14848.3 | learning rate: 2.429E-05 | global batch size:    32 | lm loss: 6.452424E+00 | loss scale: 16384.0 | grad norm: 267522.576 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4218/  159576 | consumed samples:        87728 | elapsed time per iteration (ms): 14570.6 | learning rate: 2.430E-05 | global batch size:    32 | lm loss: 6.493920E+00 | loss scale: 16384.0 | grad norm: 121372.560 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4219/  159576 | consumed samples:        87760 | elapsed time per iteration (ms): 14553.1 | learning rate: 2.431E-05 | global batch size:    32 | lm loss: 6.478834E+00 | loss scale: 16384.0 | grad norm: 112151.991 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4220/  159576 | consumed samples:        87792 | elapsed time per iteration (ms): 14546.6 | learning rate: 2.432E-05 | global batch size:    32 | lm loss: 6.452081E+00 | loss scale: 16384.0 | grad norm: 164176.147 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4221/  159576 | consumed samples:        87824 | elapsed time per iteration (ms): 14866.7 | learning rate: 2.432E-05 | global batch size:    32 | lm loss: 6.616721E+00 | loss scale: 16384.0 | grad norm: 88412.117 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4222/  159576 | consumed samples:        87856 | elapsed time per iteration (ms): 14831.9 | learning rate: 2.433E-05 | global batch size:    32 | lm loss: 6.396004E+00 | loss scale: 16384.0 | grad norm: 116548.345 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4223/  159576 | consumed samples:        87888 | elapsed time per iteration (ms): 14530.1 | learning rate: 2.434E-05 | global batch size:    32 | lm loss: 6.223457E+00 | loss scale: 16384.0 | grad norm: 151936.770 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4224/  159576 | consumed samples:        87920 | elapsed time per iteration (ms): 14526.4 | learning rate: 2.435E-05 | global batch size:    32 | lm loss: 6.471479E+00 | loss scale: 16384.0 | grad norm: 107150.884 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4225/  159576 | consumed samples:        87952 | elapsed time per iteration (ms): 14556.3 | learning rate: 2.436E-05 | global batch size:    32 | lm loss: 6.420123E+00 | loss scale: 16384.0 | grad norm: 118336.101 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4226/  159576 | consumed samples:        87984 | elapsed time per iteration (ms): 14779.5 | learning rate: 2.437E-05 | global batch size:    32 | lm loss: 6.463729E+00 | loss scale: 16384.0 | grad norm: 105104.920 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4227/  159576 | consumed samples:        88016 | elapsed time per iteration (ms): 14616.1 | learning rate: 2.438E-05 | global batch size:    32 | lm loss: 6.384348E+00 | loss scale: 16384.0 | grad norm: 121857.325 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4228/  159576 | consumed samples:        88048 | elapsed time per iteration (ms): 14595.0 | learning rate: 2.439E-05 | global batch size:    32 | lm loss: 6.562186E+00 | loss scale: 16384.0 | grad norm: 120895.871 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4229/  159576 | consumed samples:        88080 | elapsed time per iteration (ms): 14592.9 | learning rate: 2.439E-05 | global batch size:    32 | lm loss: 6.614166E+00 | loss scale: 16384.0 | grad norm: 141989.840 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4230/  159576 | consumed samples:        88112 | elapsed time per iteration (ms): 14745.8 | learning rate: 2.440E-05 | global batch size:    32 | lm loss: 6.416856E+00 | loss scale: 16384.0 | grad norm: 135385.270 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4231/  159576 | consumed samples:        88144 | elapsed time per iteration (ms): 14547.3 | learning rate: 2.441E-05 | global batch size:    32 | lm loss: 6.576384E+00 | loss scale: 16384.0 | grad norm: 129034.853 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4232/  159576 | consumed samples:        88176 | elapsed time per iteration (ms): 14539.9 | learning rate: 2.442E-05 | global batch size:    32 | lm loss: 6.371499E+00 | loss scale: 16384.0 | grad norm: 102463.674 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4233/  159576 | consumed samples:        88208 | elapsed time per iteration (ms): 14580.8 | learning rate: 2.443E-05 | global batch size:    32 | lm loss: 6.598085E+00 | loss scale: 16384.0 | grad norm: 105075.872 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4234/  159576 | consumed samples:        88240 | elapsed time per iteration (ms): 14766.2 | learning rate: 2.444E-05 | global batch size:    32 | lm loss: 6.536204E+00 | loss scale: 16384.0 | grad norm: 109004.528 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4235/  159576 | consumed samples:        88272 | elapsed time per iteration (ms): 14518.0 | learning rate: 2.445E-05 | global batch size:    32 | lm loss: 6.663161E+00 | loss scale: 16384.0 | grad norm: 197099.956 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4236/  159576 | consumed samples:        88304 | elapsed time per iteration (ms): 14598.2 | learning rate: 2.446E-05 | global batch size:    32 | lm loss: 6.451008E+00 | loss scale: 16384.0 | grad norm: 125746.339 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4237/  159576 | consumed samples:        88336 | elapsed time per iteration (ms): 14568.7 | learning rate: 2.447E-05 | global batch size:    32 | lm loss: 6.306778E+00 | loss scale: 16384.0 | grad norm: 145717.953 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4238/  159576 | consumed samples:        88368 | elapsed time per iteration (ms): 14844.4 | learning rate: 2.447E-05 | global batch size:    32 | lm loss: 6.637146E+00 | loss scale: 16384.0 | grad norm: 161986.022 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4239/  159576 | consumed samples:        88400 | elapsed time per iteration (ms): 14550.6 | learning rate: 2.448E-05 | global batch size:    32 | lm loss: 6.518569E+00 | loss scale: 16384.0 | grad norm: 114815.197 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4240/  159576 | consumed samples:        88432 | elapsed time per iteration (ms): 14540.5 | learning rate: 2.449E-05 | global batch size:    32 | lm loss: 6.644086E+00 | loss scale: 16384.0 | grad norm: 127083.954 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4241/  159576 | consumed samples:        88464 | elapsed time per iteration (ms): 14556.9 | learning rate: 2.450E-05 | global batch size:    32 | lm loss: 6.359149E+00 | loss scale: 16384.0 | grad norm: 119916.985 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4242/  159576 | consumed samples:        88496 | elapsed time per iteration (ms): 14950.3 | learning rate: 2.451E-05 | global batch size:    32 | lm loss: 6.517668E+00 | loss scale: 16384.0 | grad norm: 116850.173 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4243/  159576 | consumed samples:        88528 | elapsed time per iteration (ms): 14575.9 | learning rate: 2.452E-05 | global batch size:    32 | lm loss: 6.345152E+00 | loss scale: 16384.0 | grad norm: 106829.623 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4244/  159576 | consumed samples:        88560 | elapsed time per iteration (ms): 14588.0 | learning rate: 2.453E-05 | global batch size:    32 | lm loss: 6.476923E+00 | loss scale: 16384.0 | grad norm: 121409.721 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4245/  159576 | consumed samples:        88592 | elapsed time per iteration (ms): 14539.0 | learning rate: 2.454E-05 | global batch size:    32 | lm loss: 6.428369E+00 | loss scale: 16384.0 | grad norm: 99872.898 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4246/  159576 | consumed samples:        88624 | elapsed time per iteration (ms): 15044.1 | learning rate: 2.455E-05 | global batch size:    32 | lm loss: 6.447415E+00 | loss scale: 16384.0 | grad norm: 102765.648 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4247/  159576 | consumed samples:        88656 | elapsed time per iteration (ms): 14546.9 | learning rate: 2.455E-05 | global batch size:    32 | lm loss: 6.336578E+00 | loss scale: 16384.0 | grad norm: 90835.944 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4248/  159576 | consumed samples:        88688 | elapsed time per iteration (ms): 14540.1 | learning rate: 2.456E-05 | global batch size:    32 | lm loss: 6.555513E+00 | loss scale: 16384.0 | grad norm: 104407.993 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4249/  159576 | consumed samples:        88720 | elapsed time per iteration (ms): 14613.4 | learning rate: 2.457E-05 | global batch size:    32 | lm loss: 6.546042E+00 | loss scale: 16384.0 | grad norm: 115379.011 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4250/  159576 | consumed samples:        88752 | elapsed time per iteration (ms): 14829.6 | learning rate: 2.458E-05 | global batch size:    32 | lm loss: 6.436588E+00 | loss scale: 16384.0 | grad norm: 107293.323 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4251/  159576 | consumed samples:        88784 | elapsed time per iteration (ms): 14544.9 | learning rate: 2.459E-05 | global batch size:    32 | lm loss: 6.438442E+00 | loss scale: 16384.0 | grad norm: 105034.238 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4252/  159576 | consumed samples:        88816 | elapsed time per iteration (ms): 14563.6 | learning rate: 2.460E-05 | global batch size:    32 | lm loss: 6.473608E+00 | loss scale: 16384.0 | grad norm: 84036.769 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4253/  159576 | consumed samples:        88848 | elapsed time per iteration (ms): 14528.1 | learning rate: 2.461E-05 | global batch size:    32 | lm loss: 6.422614E+00 | loss scale: 16384.0 | grad norm: 95068.711 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4254/  159576 | consumed samples:        88880 | elapsed time per iteration (ms): 14918.1 | learning rate: 2.462E-05 | global batch size:    32 | lm loss: 6.295578E+00 | loss scale: 16384.0 | grad norm: 114489.641 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4255/  159576 | consumed samples:        88912 | elapsed time per iteration (ms): 14525.9 | learning rate: 2.463E-05 | global batch size:    32 | lm loss: 6.416272E+00 | loss scale: 16384.0 | grad norm: 91261.339 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4256/  159576 | consumed samples:        88944 | elapsed time per iteration (ms): 14525.5 | learning rate: 2.463E-05 | global batch size:    32 | lm loss: 6.517479E+00 | loss scale: 32768.0 | grad norm: 94254.434 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4257/  159576 | consumed samples:        88976 | elapsed time per iteration (ms): 14555.5 | learning rate: 2.464E-05 | global batch size:    32 | lm loss: 6.469455E+00 | loss scale: 32768.0 | grad norm: 174372.981 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4258/  159576 | consumed samples:        89008 | elapsed time per iteration (ms): 14928.2 | learning rate: 2.465E-05 | global batch size:    32 | lm loss: 6.408867E+00 | loss scale: 32768.0 | grad norm: 205212.434 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4259/  159576 | consumed samples:        89040 | elapsed time per iteration (ms): 14529.5 | learning rate: 2.466E-05 | global batch size:    32 | lm loss: 6.518348E+00 | loss scale: 32768.0 | grad norm: 175125.876 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4260/  159576 | consumed samples:        89072 | elapsed time per iteration (ms): 14608.9 | learning rate: 2.467E-05 | global batch size:    32 | lm loss: 6.456366E+00 | loss scale: 32768.0 | grad norm: 180925.606 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4261/  159576 | consumed samples:        89104 | elapsed time per iteration (ms): 14541.2 | learning rate: 2.468E-05 | global batch size:    32 | lm loss: 6.688640E+00 | loss scale: 32768.0 | grad norm: 205129.683 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4262/  159576 | consumed samples:        89136 | elapsed time per iteration (ms): 14984.8 | learning rate: 2.469E-05 | global batch size:    32 | lm loss: 6.381848E+00 | loss scale: 32768.0 | grad norm: 194086.359 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4263/  159576 | consumed samples:        89168 | elapsed time per iteration (ms): 14627.4 | learning rate: 2.470E-05 | global batch size:    32 | lm loss: 6.325251E+00 | loss scale: 32768.0 | grad norm: 200329.366 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4264/  159576 | consumed samples:        89200 | elapsed time per iteration (ms): 14514.4 | learning rate: 2.471E-05 | global batch size:    32 | lm loss: 6.384187E+00 | loss scale: 32768.0 | grad norm: 206513.330 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4265/  159576 | consumed samples:        89232 | elapsed time per iteration (ms): 14532.8 | learning rate: 2.471E-05 | global batch size:    32 | lm loss: 6.524798E+00 | loss scale: 32768.0 | grad norm: 207588.856 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4266/  159576 | consumed samples:        89264 | elapsed time per iteration (ms): 14499.0 | learning rate: 2.472E-05 | global batch size:    32 | lm loss: 6.427965E+00 | loss scale: 32768.0 | grad norm: 270396.360 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4267/  159576 | consumed samples:        89296 | elapsed time per iteration (ms): 14964.3 | learning rate: 2.473E-05 | global batch size:    32 | lm loss: 6.508441E+00 | loss scale: 32768.0 | grad norm: 256825.207 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4268/  159576 | consumed samples:        89328 | elapsed time per iteration (ms): 14573.4 | learning rate: 2.474E-05 | global batch size:    32 | lm loss: 6.281446E+00 | loss scale: 32768.0 | grad norm: 175050.841 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4269/  159576 | consumed samples:        89360 | elapsed time per iteration (ms): 14497.3 | learning rate: 2.475E-05 | global batch size:    32 | lm loss: 6.477619E+00 | loss scale: 32768.0 | grad norm: 194699.259 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4270/  159576 | consumed samples:        89392 | elapsed time per iteration (ms): 14560.8 | learning rate: 2.476E-05 | global batch size:    32 | lm loss: 6.521669E+00 | loss scale: 32768.0 | grad norm: 204025.819 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4271/  159576 | consumed samples:        89424 | elapsed time per iteration (ms): 14634.9 | learning rate: 2.477E-05 | global batch size:    32 | lm loss: 6.532991E+00 | loss scale: 32768.0 | grad norm: 218350.369 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4272/  159576 | consumed samples:        89456 | elapsed time per iteration (ms): 14566.6 | learning rate: 2.478E-05 | global batch size:    32 | lm loss: 6.491451E+00 | loss scale: 32768.0 | grad norm: 196213.759 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4273/  159576 | consumed samples:        89488 | elapsed time per iteration (ms): 14504.5 | learning rate: 2.479E-05 | global batch size:    32 | lm loss: 6.527338E+00 | loss scale: 32768.0 | grad norm: 254430.436 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4274/  159576 | consumed samples:        89520 | elapsed time per iteration (ms): 14538.5 | learning rate: 2.479E-05 | global batch size:    32 | lm loss: 6.303001E+00 | loss scale: 32768.0 | grad norm: 189173.505 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4275/  159576 | consumed samples:        89552 | elapsed time per iteration (ms): 14691.4 | learning rate: 2.480E-05 | global batch size:    32 | lm loss: 6.465518E+00 | loss scale: 32768.0 | grad norm: 266867.999 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4276/  159576 | consumed samples:        89584 | elapsed time per iteration (ms): 14571.4 | learning rate: 2.481E-05 | global batch size:    32 | lm loss: 6.562708E+00 | loss scale: 32768.0 | grad norm: 213181.091 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4277/  159576 | consumed samples:        89616 | elapsed time per iteration (ms): 14513.3 | learning rate: 2.482E-05 | global batch size:    32 | lm loss: 6.490031E+00 | loss scale: 32768.0 | grad norm: 200238.543 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4278/  159576 | consumed samples:        89648 | elapsed time per iteration (ms): 14545.3 | learning rate: 2.483E-05 | global batch size:    32 | lm loss: 6.452188E+00 | loss scale: 32768.0 | grad norm: 209603.587 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4279/  159576 | consumed samples:        89680 | elapsed time per iteration (ms): 14892.6 | learning rate: 2.484E-05 | global batch size:    32 | lm loss: 6.402837E+00 | loss scale: 32768.0 | grad norm: 213512.626 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4280/  159576 | consumed samples:        89712 | elapsed time per iteration (ms): 14552.6 | learning rate: 2.485E-05 | global batch size:    32 | lm loss: 6.481530E+00 | loss scale: 32768.0 | grad norm: 218939.275 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4281/  159576 | consumed samples:        89744 | elapsed time per iteration (ms): 14525.9 | learning rate: 2.486E-05 | global batch size:    32 | lm loss: 6.481557E+00 | loss scale: 32768.0 | grad norm: 211553.359 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4282/  159576 | consumed samples:        89776 | elapsed time per iteration (ms): 14536.1 | learning rate: 2.487E-05 | global batch size:    32 | lm loss: 6.396571E+00 | loss scale: 32768.0 | grad norm: 200119.282 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4283/  159576 | consumed samples:        89808 | elapsed time per iteration (ms): 14897.4 | learning rate: 2.487E-05 | global batch size:    32 | lm loss: 6.437448E+00 | loss scale: 32768.0 | grad norm: 211733.893 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4284/  159576 | consumed samples:        89840 | elapsed time per iteration (ms): 14635.9 | learning rate: 2.488E-05 | global batch size:    32 | lm loss: 6.477830E+00 | loss scale: 32768.0 | grad norm: 273937.689 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4285/  159576 | consumed samples:        89872 | elapsed time per iteration (ms): 14565.4 | learning rate: 2.489E-05 | global batch size:    32 | lm loss: 6.567824E+00 | loss scale: 32768.0 | grad norm: 210402.154 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4286/  159576 | consumed samples:        89904 | elapsed time per iteration (ms): 14519.6 | learning rate: 2.490E-05 | global batch size:    32 | lm loss: 6.385768E+00 | loss scale: 32768.0 | grad norm: 203200.040 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4287/  159576 | consumed samples:        89936 | elapsed time per iteration (ms): 14914.9 | learning rate: 2.491E-05 | global batch size:    32 | lm loss: 6.397992E+00 | loss scale: 32768.0 | grad norm: 182816.610 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4288/  159576 | consumed samples:        89968 | elapsed time per iteration (ms): 14476.6 | learning rate: 2.492E-05 | global batch size:    32 | lm loss: 6.388610E+00 | loss scale: 32768.0 | grad norm: 199735.518 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4289/  159576 | consumed samples:        90000 | elapsed time per iteration (ms): 14570.5 | learning rate: 2.493E-05 | global batch size:    32 | lm loss: 6.506209E+00 | loss scale: 32768.0 | grad norm: 206990.921 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4290/  159576 | consumed samples:        90032 | elapsed time per iteration (ms): 14531.9 | learning rate: 2.494E-05 | global batch size:    32 | lm loss: 6.351604E+00 | loss scale: 32768.0 | grad norm: 204481.534 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4291/  159576 | consumed samples:        90064 | elapsed time per iteration (ms): 14860.6 | learning rate: 2.495E-05 | global batch size:    32 | lm loss: 6.518882E+00 | loss scale: 32768.0 | grad norm: 236219.696 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4292/  159576 | consumed samples:        90096 | elapsed time per iteration (ms): 14581.4 | learning rate: 2.495E-05 | global batch size:    32 | lm loss: 6.428777E+00 | loss scale: 32768.0 | grad norm: 187907.904 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4293/  159576 | consumed samples:        90128 | elapsed time per iteration (ms): 14508.1 | learning rate: 2.496E-05 | global batch size:    32 | lm loss: 6.327142E+00 | loss scale: 32768.0 | grad norm: 204872.451 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4294/  159576 | consumed samples:        90160 | elapsed time per iteration (ms): 14534.7 | learning rate: 2.497E-05 | global batch size:    32 | lm loss: 6.385339E+00 | loss scale: 32768.0 | grad norm: 233375.233 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4295/  159576 | consumed samples:        90192 | elapsed time per iteration (ms): 14858.3 | learning rate: 2.498E-05 | global batch size:    32 | lm loss: 6.416627E+00 | loss scale: 32768.0 | grad norm: 222806.309 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4296/  159576 | consumed samples:        90224 | elapsed time per iteration (ms): 14474.6 | learning rate: 2.499E-05 | global batch size:    32 | lm loss: 6.518059E+00 | loss scale: 32768.0 | grad norm: 226593.449 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4297/  159576 | consumed samples:        90256 | elapsed time per iteration (ms): 14569.0 | learning rate: 2.500E-05 | global batch size:    32 | lm loss: 6.133147E+00 | loss scale: 32768.0 | grad norm: 267419.394 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4298/  159576 | consumed samples:        90288 | elapsed time per iteration (ms): 14566.4 | learning rate: 2.501E-05 | global batch size:    32 | lm loss: 6.308548E+00 | loss scale: 32768.0 | grad norm: 204598.561 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4299/  159576 | consumed samples:        90320 | elapsed time per iteration (ms): 14984.7 | learning rate: 2.502E-05 | global batch size:    32 | lm loss: 6.369866E+00 | loss scale: 32768.0 | grad norm: 221545.190 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4300/  159576 | consumed samples:        90352 | elapsed time per iteration (ms): 14484.6 | learning rate: 2.503E-05 | global batch size:    32 | lm loss: 6.530766E+00 | loss scale: 32768.0 | grad norm: 267800.060 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4301/  159576 | consumed samples:        90384 | elapsed time per iteration (ms): 14557.5 | learning rate: 2.503E-05 | global batch size:    32 | lm loss: 6.503004E+00 | loss scale: 32768.0 | grad norm: 228461.361 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4302/  159576 | consumed samples:        90416 | elapsed time per iteration (ms): 14550.0 | learning rate: 2.504E-05 | global batch size:    32 | lm loss: 6.538440E+00 | loss scale: 32768.0 | grad norm: 190026.980 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4303/  159576 | consumed samples:        90448 | elapsed time per iteration (ms): 14655.7 | learning rate: 2.505E-05 | global batch size:    32 | lm loss: 6.461242E+00 | loss scale: 32768.0 | grad norm: 211257.650 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4304/  159576 | consumed samples:        90480 | elapsed time per iteration (ms): 14769.1 | learning rate: 2.506E-05 | global batch size:    32 | lm loss: 6.479248E+00 | loss scale: 32768.0 | grad norm: 198712.587 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4305/  159576 | consumed samples:        90512 | elapsed time per iteration (ms): 14577.3 | learning rate: 2.507E-05 | global batch size:    32 | lm loss: 6.432651E+00 | loss scale: 32768.0 | grad norm: 206822.372 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4306/  159576 | consumed samples:        90544 | elapsed time per iteration (ms): 14533.2 | learning rate: 2.508E-05 | global batch size:    32 | lm loss: 6.347961E+00 | loss scale: 32768.0 | grad norm: 195748.989 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4307/  159576 | consumed samples:        90576 | elapsed time per iteration (ms): 14563.8 | learning rate: 2.509E-05 | global batch size:    32 | lm loss: 6.507642E+00 | loss scale: 32768.0 | grad norm: 218663.158 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4308/  159576 | consumed samples:        90608 | elapsed time per iteration (ms): 14732.7 | learning rate: 2.510E-05 | global batch size:    32 | lm loss: 6.541059E+00 | loss scale: 32768.0 | grad norm: 228970.274 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4309/  159576 | consumed samples:        90640 | elapsed time per iteration (ms): 14469.9 | learning rate: 2.511E-05 | global batch size:    32 | lm loss: 6.424891E+00 | loss scale: 32768.0 | grad norm: 196198.889 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4310/  159576 | consumed samples:        90672 | elapsed time per iteration (ms): 14508.3 | learning rate: 2.511E-05 | global batch size:    32 | lm loss: 6.490376E+00 | loss scale: 32768.0 | grad norm: 215960.903 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4311/  159576 | consumed samples:        90704 | elapsed time per iteration (ms): 14508.3 | learning rate: 2.512E-05 | global batch size:    32 | lm loss: 6.488754E+00 | loss scale: 32768.0 | grad norm: 195374.466 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4312/  159576 | consumed samples:        90736 | elapsed time per iteration (ms): 14753.9 | learning rate: 2.513E-05 | global batch size:    32 | lm loss: 6.448671E+00 | loss scale: 32768.0 | grad norm: 227732.025 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4313/  159576 | consumed samples:        90768 | elapsed time per iteration (ms): 14571.8 | learning rate: 2.514E-05 | global batch size:    32 | lm loss: 6.500753E+00 | loss scale: 32768.0 | grad norm: 266264.636 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4314/  159576 | consumed samples:        90800 | elapsed time per iteration (ms): 14601.7 | learning rate: 2.515E-05 | global batch size:    32 | lm loss: 6.454448E+00 | loss scale: 32768.0 | grad norm: 224312.848 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4315/  159576 | consumed samples:        90832 | elapsed time per iteration (ms): 14520.9 | learning rate: 2.516E-05 | global batch size:    32 | lm loss: 6.340928E+00 | loss scale: 32768.0 | grad norm: 252168.513 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4316/  159576 | consumed samples:        90864 | elapsed time per iteration (ms): 14650.6 | learning rate: 2.517E-05 | global batch size:    32 | lm loss: 6.524774E+00 | loss scale: 32768.0 | grad norm: 233060.511 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4317/  159576 | consumed samples:        90896 | elapsed time per iteration (ms): 14507.8 | learning rate: 2.518E-05 | global batch size:    32 | lm loss: 6.526123E+00 | loss scale: 32768.0 | grad norm: 228145.157 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4318/  159576 | consumed samples:        90928 | elapsed time per iteration (ms): 14505.6 | learning rate: 2.518E-05 | global batch size:    32 | lm loss: 6.554380E+00 | loss scale: 32768.0 | grad norm: 215247.212 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-24 19:07:09] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1165978_[1-10%1] on 'gpu_p13' partition)
[2021-09-24 19:07:09] PULSE: tr8-104B is running for 13:14:58 since 2021-09-24T05:52:11 (1162855_1 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8])
 iteration     4319/  159576 | consumed samples:        90960 | elapsed time per iteration (ms): 14496.4 | learning rate: 2.519E-05 | global batch size:    32 | lm loss: 6.312326E+00 | loss scale: 32768.0 | grad norm: 214751.055 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4320/  159576 | consumed samples:        90992 | elapsed time per iteration (ms): 14941.6 | learning rate: 2.520E-05 | global batch size:    32 | lm loss: 6.452510E+00 | loss scale: 32768.0 | grad norm: 260142.714 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4321/  159576 | consumed samples:        91024 | elapsed time per iteration (ms): 14618.7 | learning rate: 2.521E-05 | global batch size:    32 | lm loss: 6.420647E+00 | loss scale: 32768.0 | grad norm: 225655.261 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4322/  159576 | consumed samples:        91056 | elapsed time per iteration (ms): 14566.6 | learning rate: 2.522E-05 | global batch size:    32 | lm loss: 6.402806E+00 | loss scale: 32768.0 | grad norm: 291928.342 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4323/  159576 | consumed samples:        91088 | elapsed time per iteration (ms): 14498.7 | learning rate: 2.523E-05 | global batch size:    32 | lm loss: 6.391022E+00 | loss scale: 32768.0 | grad norm: 237551.777 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4324/  159576 | consumed samples:        91120 | elapsed time per iteration (ms): 15211.7 | learning rate: 2.524E-05 | global batch size:    32 | lm loss: 6.430393E+00 | loss scale: 32768.0 | grad norm: 234733.593 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4325/  159576 | consumed samples:        91152 | elapsed time per iteration (ms): 14439.1 | learning rate: 2.525E-05 | global batch size:    32 | lm loss: 6.406878E+00 | loss scale: 32768.0 | grad norm: 212091.318 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4326/  159576 | consumed samples:        91184 | elapsed time per iteration (ms): 14533.1 | learning rate: 2.526E-05 | global batch size:    32 | lm loss: 6.439167E+00 | loss scale: 32768.0 | grad norm: 244000.757 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4327/  159576 | consumed samples:        91216 | elapsed time per iteration (ms): 14508.9 | learning rate: 2.526E-05 | global batch size:    32 | lm loss: 6.334565E+00 | loss scale: 32768.0 | grad norm: 183767.589 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4328/  159576 | consumed samples:        91248 | elapsed time per iteration (ms): 14921.5 | learning rate: 2.527E-05 | global batch size:    32 | lm loss: 6.456017E+00 | loss scale: 32768.0 | grad norm: 239736.759 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4329/  159576 | consumed samples:        91280 | elapsed time per iteration (ms): 14572.2 | learning rate: 2.528E-05 | global batch size:    32 | lm loss: 6.367092E+00 | loss scale: 32768.0 | grad norm: 195126.741 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4330/  159576 | consumed samples:        91312 | elapsed time per iteration (ms): 14531.1 | learning rate: 2.529E-05 | global batch size:    32 | lm loss: 6.383262E+00 | loss scale: 32768.0 | grad norm: 208256.244 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4331/  159576 | consumed samples:        91344 | elapsed time per iteration (ms): 14591.9 | learning rate: 2.530E-05 | global batch size:    32 | lm loss: 6.502596E+00 | loss scale: 32768.0 | grad norm: 248824.057 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4332/  159576 | consumed samples:        91376 | elapsed time per iteration (ms): 14794.2 | learning rate: 2.531E-05 | global batch size:    32 | lm loss: 6.386366E+00 | loss scale: 32768.0 | grad norm: 223413.013 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4333/  159576 | consumed samples:        91408 | elapsed time per iteration (ms): 14447.8 | learning rate: 2.532E-05 | global batch size:    32 | lm loss: 6.470964E+00 | loss scale: 32768.0 | grad norm: 220869.102 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4334/  159576 | consumed samples:        91440 | elapsed time per iteration (ms): 14523.5 | learning rate: 2.533E-05 | global batch size:    32 | lm loss: 6.423388E+00 | loss scale: 32768.0 | grad norm: 204896.062 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4335/  159576 | consumed samples:        91472 | elapsed time per iteration (ms): 14548.8 | learning rate: 2.534E-05 | global batch size:    32 | lm loss: 6.516037E+00 | loss scale: 32768.0 | grad norm: 214455.132 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4336/  159576 | consumed samples:        91504 | elapsed time per iteration (ms): 14925.7 | learning rate: 2.534E-05 | global batch size:    32 | lm loss: 6.420337E+00 | loss scale: 32768.0 | grad norm: 252272.858 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4337/  159576 | consumed samples:        91536 | elapsed time per iteration (ms): 14576.6 | learning rate: 2.535E-05 | global batch size:    32 | lm loss: 6.464952E+00 | loss scale: 32768.0 | grad norm: 193893.530 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4338/  159576 | consumed samples:        91568 | elapsed time per iteration (ms): 14502.1 | learning rate: 2.536E-05 | global batch size:    32 | lm loss: 6.492158E+00 | loss scale: 32768.0 | grad norm: 243709.356 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4339/  159576 | consumed samples:        91600 | elapsed time per iteration (ms): 14503.5 | learning rate: 2.537E-05 | global batch size:    32 | lm loss: 6.239275E+00 | loss scale: 32768.0 | grad norm: 206242.433 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4340/  159576 | consumed samples:        91632 | elapsed time per iteration (ms): 14881.4 | learning rate: 2.538E-05 | global batch size:    32 | lm loss: 6.484446E+00 | loss scale: 32768.0 | grad norm: 213552.367 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4341/  159576 | consumed samples:        91664 | elapsed time per iteration (ms): 14651.1 | learning rate: 2.539E-05 | global batch size:    32 | lm loss: 6.419237E+00 | loss scale: 32768.0 | grad norm: 210520.111 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4342/  159576 | consumed samples:        91696 | elapsed time per iteration (ms): 14512.3 | learning rate: 2.540E-05 | global batch size:    32 | lm loss: 6.452721E+00 | loss scale: 32768.0 | grad norm: 238634.139 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4343/  159576 | consumed samples:        91728 | elapsed time per iteration (ms): 14558.7 | learning rate: 2.541E-05 | global batch size:    32 | lm loss: 6.347074E+00 | loss scale: 32768.0 | grad norm: 202447.417 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4344/  159576 | consumed samples:        91760 | elapsed time per iteration (ms): 14594.4 | learning rate: 2.542E-05 | global batch size:    32 | lm loss: 6.520543E+00 | loss scale: 32768.0 | grad norm: 239073.554 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4345/  159576 | consumed samples:        91792 | elapsed time per iteration (ms): 14908.5 | learning rate: 2.542E-05 | global batch size:    32 | lm loss: 6.421722E+00 | loss scale: 32768.0 | grad norm: 217284.913 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4346/  159576 | consumed samples:        91824 | elapsed time per iteration (ms): 14533.0 | learning rate: 2.543E-05 | global batch size:    32 | lm loss: 6.272108E+00 | loss scale: 32768.0 | grad norm: 200271.872 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4347/  159576 | consumed samples:        91856 | elapsed time per iteration (ms): 14569.7 | learning rate: 2.544E-05 | global batch size:    32 | lm loss: 6.532617E+00 | loss scale: 32768.0 | grad norm: 194761.374 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4348/  159576 | consumed samples:        91888 | elapsed time per iteration (ms): 14475.9 | learning rate: 2.545E-05 | global batch size:    32 | lm loss: 6.471928E+00 | loss scale: 32768.0 | grad norm: 217213.277 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4349/  159576 | consumed samples:        91920 | elapsed time per iteration (ms): 14760.6 | learning rate: 2.546E-05 | global batch size:    32 | lm loss: 6.416161E+00 | loss scale: 32768.0 | grad norm: 224313.842 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4350/  159576 | consumed samples:        91952 | elapsed time per iteration (ms): 14554.3 | learning rate: 2.547E-05 | global batch size:    32 | lm loss: 6.550965E+00 | loss scale: 32768.0 | grad norm: 241887.267 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4351/  159576 | consumed samples:        91984 | elapsed time per iteration (ms): 14563.9 | learning rate: 2.548E-05 | global batch size:    32 | lm loss: 6.496109E+00 | loss scale: 32768.0 | grad norm: 216683.843 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4352/  159576 | consumed samples:        92016 | elapsed time per iteration (ms): 14514.3 | learning rate: 2.549E-05 | global batch size:    32 | lm loss: 6.359037E+00 | loss scale: 32768.0 | grad norm: 205500.964 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4353/  159576 | consumed samples:        92048 | elapsed time per iteration (ms): 14703.1 | learning rate: 2.550E-05 | global batch size:    32 | lm loss: 6.333501E+00 | loss scale: 32768.0 | grad norm: 326501.197 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4354/  159576 | consumed samples:        92080 | elapsed time per iteration (ms): 14558.2 | learning rate: 2.550E-05 | global batch size:    32 | lm loss: 6.455669E+00 | loss scale: 32768.0 | grad norm: 254904.658 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4355/  159576 | consumed samples:        92112 | elapsed time per iteration (ms): 14511.5 | learning rate: 2.551E-05 | global batch size:    32 | lm loss: 6.509322E+00 | loss scale: 32768.0 | grad norm: 237041.501 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4356/  159576 | consumed samples:        92144 | elapsed time per iteration (ms): 14539.0 | learning rate: 2.552E-05 | global batch size:    32 | lm loss: 6.356802E+00 | loss scale: 32768.0 | grad norm: 268871.419 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4357/  159576 | consumed samples:        92176 | elapsed time per iteration (ms): 14822.4 | learning rate: 2.553E-05 | global batch size:    32 | lm loss: 6.599571E+00 | loss scale: 32768.0 | grad norm: 283473.183 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4358/  159576 | consumed samples:        92208 | elapsed time per iteration (ms): 14612.7 | learning rate: 2.554E-05 | global batch size:    32 | lm loss: 6.308304E+00 | loss scale: 32768.0 | grad norm: 231784.921 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4359/  159576 | consumed samples:        92240 | elapsed time per iteration (ms): 14524.9 | learning rate: 2.555E-05 | global batch size:    32 | lm loss: 6.395612E+00 | loss scale: 32768.0 | grad norm: 270045.717 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4360/  159576 | consumed samples:        92272 | elapsed time per iteration (ms): 14601.7 | learning rate: 2.556E-05 | global batch size:    32 | lm loss: 6.525626E+00 | loss scale: 32768.0 | grad norm: 275256.199 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4361/  159576 | consumed samples:        92304 | elapsed time per iteration (ms): 14951.2 | learning rate: 2.557E-05 | global batch size:    32 | lm loss: 6.457727E+00 | loss scale: 32768.0 | grad norm: 277346.905 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4362/  159576 | consumed samples:        92336 | elapsed time per iteration (ms): 14507.2 | learning rate: 2.558E-05 | global batch size:    32 | lm loss: 6.423290E+00 | loss scale: 32768.0 | grad norm: 259149.362 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4363/  159576 | consumed samples:        92368 | elapsed time per iteration (ms): 14519.9 | learning rate: 2.558E-05 | global batch size:    32 | lm loss: 6.385529E+00 | loss scale: 32768.0 | grad norm: 288729.160 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4364/  159576 | consumed samples:        92400 | elapsed time per iteration (ms): 14590.0 | learning rate: 2.559E-05 | global batch size:    32 | lm loss: 6.344237E+00 | loss scale: 32768.0 | grad norm: 224867.437 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4365/  159576 | consumed samples:        92432 | elapsed time per iteration (ms): 15022.1 | learning rate: 2.560E-05 | global batch size:    32 | lm loss: 6.361878E+00 | loss scale: 32768.0 | grad norm: 317761.599 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4366/  159576 | consumed samples:        92464 | elapsed time per iteration (ms): 14751.4 | learning rate: 2.561E-05 | global batch size:    32 | lm loss: 6.330537E+00 | loss scale: 32768.0 | grad norm: 265015.375 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4367/  159576 | consumed samples:        92496 | elapsed time per iteration (ms): 14614.0 | learning rate: 2.562E-05 | global batch size:    32 | lm loss: 6.148376E+00 | loss scale: 32768.0 | grad norm: 264202.339 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4368/  159576 | consumed samples:        92528 | elapsed time per iteration (ms): 14584.5 | learning rate: 2.563E-05 | global batch size:    32 | lm loss: 6.479382E+00 | loss scale: 32768.0 | grad norm: 264375.223 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4369/  159576 | consumed samples:        92560 | elapsed time per iteration (ms): 14918.5 | learning rate: 2.564E-05 | global batch size:    32 | lm loss: 6.363014E+00 | loss scale: 32768.0 | grad norm: 226102.568 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4370/  159576 | consumed samples:        92592 | elapsed time per iteration (ms): 14489.4 | learning rate: 2.565E-05 | global batch size:    32 | lm loss: 6.437625E+00 | loss scale: 32768.0 | grad norm: 280139.331 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4371/  159576 | consumed samples:        92624 | elapsed time per iteration (ms): 14515.3 | learning rate: 2.566E-05 | global batch size:    32 | lm loss: 6.394330E+00 | loss scale: 32768.0 | grad norm: 290041.946 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4372/  159576 | consumed samples:        92656 | elapsed time per iteration (ms): 14519.6 | learning rate: 2.566E-05 | global batch size:    32 | lm loss: 6.430163E+00 | loss scale: 32768.0 | grad norm: 318528.997 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4373/  159576 | consumed samples:        92688 | elapsed time per iteration (ms): 14816.9 | learning rate: 2.567E-05 | global batch size:    32 | lm loss: 6.494810E+00 | loss scale: 32768.0 | grad norm: 279939.060 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4374/  159576 | consumed samples:        92720 | elapsed time per iteration (ms): 14615.4 | learning rate: 2.568E-05 | global batch size:    32 | lm loss: 6.431265E+00 | loss scale: 32768.0 | grad norm: 260943.403 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4375/  159576 | consumed samples:        92752 | elapsed time per iteration (ms): 14539.2 | learning rate: 2.569E-05 | global batch size:    32 | lm loss: 6.365846E+00 | loss scale: 32768.0 | grad norm: 614516.527 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4376/  159576 | consumed samples:        92784 | elapsed time per iteration (ms): 14560.9 | learning rate: 2.570E-05 | global batch size:    32 | lm loss: 6.306572E+00 | loss scale: 32768.0 | grad norm: 303539.975 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4377/  159576 | consumed samples:        92816 | elapsed time per iteration (ms): 14894.6 | learning rate: 2.571E-05 | global batch size:    32 | lm loss: 6.444806E+00 | loss scale: 32768.0 | grad norm: 305405.289 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4378/  159576 | consumed samples:        92848 | elapsed time per iteration (ms): 14498.0 | learning rate: 2.572E-05 | global batch size:    32 | lm loss: 6.475850E+00 | loss scale: 32768.0 | grad norm: 302245.775 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4379/  159576 | consumed samples:        92880 | elapsed time per iteration (ms): 14519.5 | learning rate: 2.573E-05 | global batch size:    32 | lm loss: 6.470803E+00 | loss scale: 32768.0 | grad norm: 302163.447 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4380/  159576 | consumed samples:        92912 | elapsed time per iteration (ms): 14547.1 | learning rate: 2.574E-05 | global batch size:    32 | lm loss: 6.285831E+00 | loss scale: 32768.0 | grad norm: 245533.159 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4381/  159576 | consumed samples:        92944 | elapsed time per iteration (ms): 14903.6 | learning rate: 2.574E-05 | global batch size:    32 | lm loss: 6.382543E+00 | loss scale: 32768.0 | grad norm: 256847.499 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4382/  159576 | consumed samples:        92976 | elapsed time per iteration (ms): 14746.3 | learning rate: 2.575E-05 | global batch size:    32 | lm loss: 6.377112E+00 | loss scale: 32768.0 | grad norm: 234822.067 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4383/  159576 | consumed samples:        93008 | elapsed time per iteration (ms): 14580.0 | learning rate: 2.576E-05 | global batch size:    32 | lm loss: 6.412641E+00 | loss scale: 32768.0 | grad norm: 343040.768 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4384/  159576 | consumed samples:        93040 | elapsed time per iteration (ms): 14506.7 | learning rate: 2.577E-05 | global batch size:    32 | lm loss: 6.416348E+00 | loss scale: 32768.0 | grad norm: 291818.464 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4385/  159576 | consumed samples:        93072 | elapsed time per iteration (ms): 14512.2 | learning rate: 2.578E-05 | global batch size:    32 | lm loss: 6.425752E+00 | loss scale: 32768.0 | grad norm: 323662.796 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4386/  159576 | consumed samples:        93104 | elapsed time per iteration (ms): 14928.6 | learning rate: 2.579E-05 | global batch size:    32 | lm loss: 6.318911E+00 | loss scale: 32768.0 | grad norm: 305616.292 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4387/  159576 | consumed samples:        93136 | elapsed time per iteration (ms): 14506.3 | learning rate: 2.580E-05 | global batch size:    32 | lm loss: 6.531947E+00 | loss scale: 32768.0 | grad norm: 350201.540 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4388/  159576 | consumed samples:        93168 | elapsed time per iteration (ms): 14556.8 | learning rate: 2.581E-05 | global batch size:    32 | lm loss: 6.376329E+00 | loss scale: 32768.0 | grad norm: 345044.523 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4389/  159576 | consumed samples:        93200 | elapsed time per iteration (ms): 14537.0 | learning rate: 2.582E-05 | global batch size:    32 | lm loss: 6.381351E+00 | loss scale: 32768.0 | grad norm: 285108.825 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4390/  159576 | consumed samples:        93232 | elapsed time per iteration (ms): 14792.9 | learning rate: 2.582E-05 | global batch size:    32 | lm loss: 6.367733E+00 | loss scale: 32768.0 | grad norm: 443607.853 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4391/  159576 | consumed samples:        93264 | elapsed time per iteration (ms): 14536.7 | learning rate: 2.583E-05 | global batch size:    32 | lm loss: 6.404822E+00 | loss scale: 32768.0 | grad norm: 266018.610 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4392/  159576 | consumed samples:        93296 | elapsed time per iteration (ms): 14465.3 | learning rate: 2.584E-05 | global batch size:    32 | lm loss: 6.460493E+00 | loss scale: 32768.0 | grad norm: 388305.684 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4393/  159576 | consumed samples:        93328 | elapsed time per iteration (ms): 14549.7 | learning rate: 2.585E-05 | global batch size:    32 | lm loss: 6.312160E+00 | loss scale: 32768.0 | grad norm: 289444.907 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4394/  159576 | consumed samples:        93360 | elapsed time per iteration (ms): 14712.4 | learning rate: 2.586E-05 | global batch size:    32 | lm loss: 6.447091E+00 | loss scale: 32768.0 | grad norm: 310866.794 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4395/  159576 | consumed samples:        93392 | elapsed time per iteration (ms): 14507.9 | learning rate: 2.587E-05 | global batch size:    32 | lm loss: 6.358830E+00 | loss scale: 32768.0 | grad norm: 254147.069 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4396/  159576 | consumed samples:        93424 | elapsed time per iteration (ms): 14549.6 | learning rate: 2.588E-05 | global batch size:    32 | lm loss: 6.406147E+00 | loss scale: 32768.0 | grad norm: 368220.982 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4397/  159576 | consumed samples:        93456 | elapsed time per iteration (ms): 14535.1 | learning rate: 2.589E-05 | global batch size:    32 | lm loss: 6.511951E+00 | loss scale: 32768.0 | grad norm: 306021.916 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4398/  159576 | consumed samples:        93488 | elapsed time per iteration (ms): 14834.9 | learning rate: 2.589E-05 | global batch size:    32 | lm loss: 6.344939E+00 | loss scale: 32768.0 | grad norm: 244440.220 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4399/  159576 | consumed samples:        93520 | elapsed time per iteration (ms): 14561.9 | learning rate: 2.590E-05 | global batch size:    32 | lm loss: 6.408576E+00 | loss scale: 32768.0 | grad norm: 331789.025 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4400/  159576 | consumed samples:        93552 | elapsed time per iteration (ms): 14527.0 | learning rate: 2.591E-05 | global batch size:    32 | lm loss: 6.405599E+00 | loss scale: 32768.0 | grad norm: 389927.053 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4401/  159576 | consumed samples:        93584 | elapsed time per iteration (ms): 14530.9 | learning rate: 2.592E-05 | global batch size:    32 | lm loss: 6.461980E+00 | loss scale: 32768.0 | grad norm: 344518.886 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4402/  159576 | consumed samples:        93616 | elapsed time per iteration (ms): 15042.1 | learning rate: 2.593E-05 | global batch size:    32 | lm loss: 6.416601E+00 | loss scale: 32768.0 | grad norm: 310590.140 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4403/  159576 | consumed samples:        93648 | elapsed time per iteration (ms): 14634.8 | learning rate: 2.594E-05 | global batch size:    32 | lm loss: 6.546180E+00 | loss scale: 32768.0 | grad norm: 267385.444 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4404/  159576 | consumed samples:        93680 | elapsed time per iteration (ms): 14549.2 | learning rate: 2.595E-05 | global batch size:    32 | lm loss: 6.399436E+00 | loss scale: 32768.0 | grad norm: 298662.044 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4405/  159576 | consumed samples:        93712 | elapsed time per iteration (ms): 14489.5 | learning rate: 2.596E-05 | global batch size:    32 | lm loss: 6.306044E+00 | loss scale: 32768.0 | grad norm: 302499.736 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4406/  159576 | consumed samples:        93744 | elapsed time per iteration (ms): 14963.1 | learning rate: 2.597E-05 | global batch size:    32 | lm loss: 6.504598E+00 | loss scale: 32768.0 | grad norm: 315577.594 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4407/  159576 | consumed samples:        93776 | elapsed time per iteration (ms): 14516.0 | learning rate: 2.597E-05 | global batch size:    32 | lm loss: 6.229925E+00 | loss scale: 32768.0 | grad norm: 238182.668 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4408/  159576 | consumed samples:        93808 | elapsed time per iteration (ms): 14496.6 | learning rate: 2.598E-05 | global batch size:    32 | lm loss: 6.414362E+00 | loss scale: 32768.0 | grad norm: 274509.689 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4409/  159576 | consumed samples:        93840 | elapsed time per iteration (ms): 14543.5 | learning rate: 2.599E-05 | global batch size:    32 | lm loss: 6.355350E+00 | loss scale: 32768.0 | grad norm: 288329.828 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4410/  159576 | consumed samples:        93872 | elapsed time per iteration (ms): 14875.5 | learning rate: 2.600E-05 | global batch size:    32 | lm loss: 6.366935E+00 | loss scale: 32768.0 | grad norm: 252983.122 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4411/  159576 | consumed samples:        93904 | elapsed time per iteration (ms): 14456.2 | learning rate: 2.601E-05 | global batch size:    32 | lm loss: 6.458515E+00 | loss scale: 32768.0 | grad norm: 210575.780 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4412/  159576 | consumed samples:        93936 | elapsed time per iteration (ms): 14560.7 | learning rate: 2.602E-05 | global batch size:    32 | lm loss: 6.472146E+00 | loss scale: 32768.0 | grad norm: 237114.094 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4413/  159576 | consumed samples:        93968 | elapsed time per iteration (ms): 14587.5 | learning rate: 2.603E-05 | global batch size:    32 | lm loss: 6.359771E+00 | loss scale: 32768.0 | grad norm: 252911.801 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4414/  159576 | consumed samples:        94000 | elapsed time per iteration (ms): 14804.6 | learning rate: 2.604E-05 | global batch size:    32 | lm loss: 6.563889E+00 | loss scale: 32768.0 | grad norm: 296794.210 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4415/  159576 | consumed samples:        94032 | elapsed time per iteration (ms): 14512.9 | learning rate: 2.605E-05 | global batch size:    32 | lm loss: 6.413787E+00 | loss scale: 32768.0 | grad norm: 272034.826 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4416/  159576 | consumed samples:        94064 | elapsed time per iteration (ms): 14494.5 | learning rate: 2.605E-05 | global batch size:    32 | lm loss: 6.443899E+00 | loss scale: 32768.0 | grad norm: 290284.950 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4417/  159576 | consumed samples:        94096 | elapsed time per iteration (ms): 14536.8 | learning rate: 2.606E-05 | global batch size:    32 | lm loss: 6.472334E+00 | loss scale: 32768.0 | grad norm: 248961.089 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4418/  159576 | consumed samples:        94128 | elapsed time per iteration (ms): 14975.6 | learning rate: 2.607E-05 | global batch size:    32 | lm loss: 6.557878E+00 | loss scale: 32768.0 | grad norm: 330814.857 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4419/  159576 | consumed samples:        94160 | elapsed time per iteration (ms): 14477.8 | learning rate: 2.608E-05 | global batch size:    32 | lm loss: 6.499488E+00 | loss scale: 32768.0 | grad norm: 268804.004 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4420/  159576 | consumed samples:        94192 | elapsed time per iteration (ms): 14628.8 | learning rate: 2.609E-05 | global batch size:    32 | lm loss: 6.312944E+00 | loss scale: 32768.0 | grad norm: 264253.854 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4421/  159576 | consumed samples:        94224 | elapsed time per iteration (ms): 14519.9 | learning rate: 2.610E-05 | global batch size:    32 | lm loss: 6.392362E+00 | loss scale: 32768.0 | grad norm: 255470.733 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4422/  159576 | consumed samples:        94256 | elapsed time per iteration (ms): 14805.5 | learning rate: 2.611E-05 | global batch size:    32 | lm loss: 6.375703E+00 | loss scale: 32768.0 | grad norm: 246267.346 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4423/  159576 | consumed samples:        94288 | elapsed time per iteration (ms): 14680.3 | learning rate: 2.612E-05 | global batch size:    32 | lm loss: 6.523773E+00 | loss scale: 32768.0 | grad norm: 281090.751 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4424/  159576 | consumed samples:        94320 | elapsed time per iteration (ms): 7706.4 | learning rate: 2.612E-05 | global batch size:    32 | lm loss: 6.355268E+00 | loss scale: 32768.0 | grad norm: 281090.751 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4425/  159576 | consumed samples:        94352 | elapsed time per iteration (ms): 13992.5 | learning rate: 2.613E-05 | global batch size:    32 | lm loss: 6.391113E+00 | loss scale: 32768.0 | grad norm: 235806.214 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4426/  159576 | consumed samples:        94384 | elapsed time per iteration (ms): 14643.4 | learning rate: 2.613E-05 | global batch size:    32 | lm loss: 6.483145E+00 | loss scale: 32768.0 | grad norm: 316001.533 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4427/  159576 | consumed samples:        94416 | elapsed time per iteration (ms): 14931.0 | learning rate: 2.614E-05 | global batch size:    32 | lm loss: 6.419625E+00 | loss scale: 32768.0 | grad norm: 595148.752 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4428/  159576 | consumed samples:        94448 | elapsed time per iteration (ms): 14542.3 | learning rate: 2.615E-05 | global batch size:    32 | lm loss: 6.463273E+00 | loss scale: 32768.0 | grad norm: 310708.077 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4429/  159576 | consumed samples:        94480 | elapsed time per iteration (ms): 14522.5 | learning rate: 2.616E-05 | global batch size:    32 | lm loss: 6.427548E+00 | loss scale: 32768.0 | grad norm: 324018.149 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4430/  159576 | consumed samples:        94512 | elapsed time per iteration (ms): 14489.9 | learning rate: 2.617E-05 | global batch size:    32 | lm loss: 6.385033E+00 | loss scale: 32768.0 | grad norm: 244981.121 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4431/  159576 | consumed samples:        94560 | elapsed time per iteration (ms): 15763.7 | learning rate: 2.618E-05 | global batch size:    48 | lm loss: 6.545300E+00 | loss scale: 32768.0 | grad norm: 209680.886 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4432/  159576 | consumed samples:        94608 | elapsed time per iteration (ms): 15487.4 | learning rate: 2.620E-05 | global batch size:    48 | lm loss: 6.439948E+00 | loss scale: 32768.0 | grad norm: 242738.510 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4433/  159576 | consumed samples:        94656 | elapsed time per iteration (ms): 15516.6 | learning rate: 2.621E-05 | global batch size:    48 | lm loss: 6.392755E+00 | loss scale: 32768.0 | grad norm: 221617.752 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4434/  159576 | consumed samples:        94704 | elapsed time per iteration (ms): 15531.5 | learning rate: 2.622E-05 | global batch size:    48 | lm loss: 6.430658E+00 | loss scale: 32768.0 | grad norm: 237786.421 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4435/  159576 | consumed samples:        94752 | elapsed time per iteration (ms): 15905.6 | learning rate: 2.624E-05 | global batch size:    48 | lm loss: 6.556681E+00 | loss scale: 32768.0 | grad norm: 268817.064 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4436/  159576 | consumed samples:        94800 | elapsed time per iteration (ms): 15557.4 | learning rate: 2.625E-05 | global batch size:    48 | lm loss: 6.284402E+00 | loss scale: 32768.0 | grad norm: 217583.284 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4437/  159576 | consumed samples:        94848 | elapsed time per iteration (ms): 15418.7 | learning rate: 2.626E-05 | global batch size:    48 | lm loss: 6.449813E+00 | loss scale: 32768.0 | grad norm: 250831.113 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4438/  159576 | consumed samples:        94896 | elapsed time per iteration (ms): 15465.2 | learning rate: 2.628E-05 | global batch size:    48 | lm loss: 6.524204E+00 | loss scale: 32768.0 | grad norm: 237741.486 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4439/  159576 | consumed samples:        94944 | elapsed time per iteration (ms): 15664.4 | learning rate: 2.629E-05 | global batch size:    48 | lm loss: 6.426958E+00 | loss scale: 32768.0 | grad norm: 275670.650 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4440/  159576 | consumed samples:        94992 | elapsed time per iteration (ms): 15485.6 | learning rate: 2.630E-05 | global batch size:    48 | lm loss: 6.312765E+00 | loss scale: 32768.0 | grad norm: 236643.110 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4441/  159576 | consumed samples:        95040 | elapsed time per iteration (ms): 15554.2 | learning rate: 2.632E-05 | global batch size:    48 | lm loss: 6.353696E+00 | loss scale: 32768.0 | grad norm: 244108.176 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4442/  159576 | consumed samples:        95088 | elapsed time per iteration (ms): 15559.7 | learning rate: 2.633E-05 | global batch size:    48 | lm loss: 6.390371E+00 | loss scale: 32768.0 | grad norm: 415315.134 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4443/  159576 | consumed samples:        95136 | elapsed time per iteration (ms): 15762.5 | learning rate: 2.634E-05 | global batch size:    48 | lm loss: 6.406565E+00 | loss scale: 32768.0 | grad norm: 379916.616 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4444/  159576 | consumed samples:        95184 | elapsed time per iteration (ms): 15453.3 | learning rate: 2.636E-05 | global batch size:    48 | lm loss: 6.429417E+00 | loss scale: 32768.0 | grad norm: 221219.524 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4445/  159576 | consumed samples:        95232 | elapsed time per iteration (ms): 15417.8 | learning rate: 2.637E-05 | global batch size:    48 | lm loss: 6.443903E+00 | loss scale: 32768.0 | grad norm: 296633.656 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4446/  159576 | consumed samples:        95280 | elapsed time per iteration (ms): 15443.7 | learning rate: 2.638E-05 | global batch size:    48 | lm loss: 6.532698E+00 | loss scale: 32768.0 | grad norm: 269367.053 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4447/  159576 | consumed samples:        95328 | elapsed time per iteration (ms): 15690.5 | learning rate: 2.640E-05 | global batch size:    48 | lm loss: 6.390007E+00 | loss scale: 32768.0 | grad norm: 235234.160 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4448/  159576 | consumed samples:        95376 | elapsed time per iteration (ms): 15488.0 | learning rate: 2.641E-05 | global batch size:    48 | lm loss: 6.393896E+00 | loss scale: 32768.0 | grad norm: 210963.912 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4449/  159576 | consumed samples:        95424 | elapsed time per iteration (ms): 15546.6 | learning rate: 2.642E-05 | global batch size:    48 | lm loss: 6.387472E+00 | loss scale: 32768.0 | grad norm: 214989.320 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4450/  159576 | consumed samples:        95472 | elapsed time per iteration (ms): 15940.5 | learning rate: 2.644E-05 | global batch size:    48 | lm loss: 6.395288E+00 | loss scale: 32768.0 | grad norm: 214649.184 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4451/  159576 | consumed samples:        95520 | elapsed time per iteration (ms): 15450.6 | learning rate: 2.645E-05 | global batch size:    48 | lm loss: 6.391924E+00 | loss scale: 32768.0 | grad norm: 256872.340 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4452/  159576 | consumed samples:        95568 | elapsed time per iteration (ms): 15411.8 | learning rate: 2.646E-05 | global batch size:    48 | lm loss: 6.372116E+00 | loss scale: 32768.0 | grad norm: 227618.006 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4453/  159576 | consumed samples:        95616 | elapsed time per iteration (ms): 15430.5 | learning rate: 2.648E-05 | global batch size:    48 | lm loss: 6.411846E+00 | loss scale: 32768.0 | grad norm: 239941.344 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4454/  159576 | consumed samples:        95664 | elapsed time per iteration (ms): 15763.6 | learning rate: 2.649E-05 | global batch size:    48 | lm loss: 6.412562E+00 | loss scale: 32768.0 | grad norm: 229907.704 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4455/  159576 | consumed samples:        95712 | elapsed time per iteration (ms): 15524.7 | learning rate: 2.650E-05 | global batch size:    48 | lm loss: 6.428136E+00 | loss scale: 32768.0 | grad norm: 223866.778 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4456/  159576 | consumed samples:        95760 | elapsed time per iteration (ms): 15490.3 | learning rate: 2.652E-05 | global batch size:    48 | lm loss: 6.476852E+00 | loss scale: 32768.0 | grad norm: 263813.676 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4457/  159576 | consumed samples:        95808 | elapsed time per iteration (ms): 15514.4 | learning rate: 2.653E-05 | global batch size:    48 | lm loss: 6.382901E+00 | loss scale: 32768.0 | grad norm: 257590.659 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4458/  159576 | consumed samples:        95856 | elapsed time per iteration (ms): 15907.9 | learning rate: 2.654E-05 | global batch size:    48 | lm loss: 6.444118E+00 | loss scale: 32768.0 | grad norm: 236507.018 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4459/  159576 | consumed samples:        95904 | elapsed time per iteration (ms): 15454.4 | learning rate: 2.656E-05 | global batch size:    48 | lm loss: 6.392717E+00 | loss scale: 32768.0 | grad norm: 227300.988 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4460/  159576 | consumed samples:        95952 | elapsed time per iteration (ms): 15435.7 | learning rate: 2.657E-05 | global batch size:    48 | lm loss: 6.375526E+00 | loss scale: 32768.0 | grad norm: 217329.765 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4461/  159576 | consumed samples:        96000 | elapsed time per iteration (ms): 15463.0 | learning rate: 2.658E-05 | global batch size:    48 | lm loss: 6.442908E+00 | loss scale: 32768.0 | grad norm: 210214.078 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4462/  159576 | consumed samples:        96048 | elapsed time per iteration (ms): 15890.8 | learning rate: 2.660E-05 | global batch size:    48 | lm loss: 6.347652E+00 | loss scale: 32768.0 | grad norm: 241592.870 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4463/  159576 | consumed samples:        96096 | elapsed time per iteration (ms): 15523.3 | learning rate: 2.661E-05 | global batch size:    48 | lm loss: 6.408596E+00 | loss scale: 32768.0 | grad norm: 286741.620 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4464/  159576 | consumed samples:        96144 | elapsed time per iteration (ms): 15484.1 | learning rate: 2.662E-05 | global batch size:    48 | lm loss: 6.423483E+00 | loss scale: 32768.0 | grad norm: 227347.115 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4465/  159576 | consumed samples:        96192 | elapsed time per iteration (ms): 15505.4 | learning rate: 2.664E-05 | global batch size:    48 | lm loss: 6.465323E+00 | loss scale: 32768.0 | grad norm: 278891.247 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4466/  159576 | consumed samples:        96240 | elapsed time per iteration (ms): 15734.3 | learning rate: 2.665E-05 | global batch size:    48 | lm loss: 6.540909E+00 | loss scale: 32768.0 | grad norm: 271330.289 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4467/  159576 | consumed samples:        96288 | elapsed time per iteration (ms): 15463.2 | learning rate: 2.666E-05 | global batch size:    48 | lm loss: 6.366038E+00 | loss scale: 32768.0 | grad norm: 230305.551 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4468/  159576 | consumed samples:        96336 | elapsed time per iteration (ms): 15456.1 | learning rate: 2.668E-05 | global batch size:    48 | lm loss: 6.383101E+00 | loss scale: 32768.0 | grad norm: 266194.555 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4469/  159576 | consumed samples:        96384 | elapsed time per iteration (ms): 15450.4 | learning rate: 2.669E-05 | global batch size:    48 | lm loss: 6.383107E+00 | loss scale: 32768.0 | grad norm: 224990.535 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4470/  159576 | consumed samples:        96432 | elapsed time per iteration (ms): 15624.0 | learning rate: 2.670E-05 | global batch size:    48 | lm loss: 6.393697E+00 | loss scale: 32768.0 | grad norm: 301446.071 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4471/  159576 | consumed samples:        96480 | elapsed time per iteration (ms): 15530.2 | learning rate: 2.672E-05 | global batch size:    48 | lm loss: 6.364079E+00 | loss scale: 32768.0 | grad norm: 215922.999 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4472/  159576 | consumed samples:        96528 | elapsed time per iteration (ms): 15512.2 | learning rate: 2.673E-05 | global batch size:    48 | lm loss: 6.373242E+00 | loss scale: 32768.0 | grad norm: 297810.241 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4473/  159576 | consumed samples:        96576 | elapsed time per iteration (ms): 15493.5 | learning rate: 2.674E-05 | global batch size:    48 | lm loss: 6.458824E+00 | loss scale: 32768.0 | grad norm: 253875.814 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4474/  159576 | consumed samples:        96624 | elapsed time per iteration (ms): 16109.8 | learning rate: 2.676E-05 | global batch size:    48 | lm loss: 6.444027E+00 | loss scale: 32768.0 | grad norm: 235767.912 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4475/  159576 | consumed samples:        96672 | elapsed time per iteration (ms): 15442.4 | learning rate: 2.677E-05 | global batch size:    48 | lm loss: 6.379702E+00 | loss scale: 32768.0 | grad norm: 200816.895 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4476/  159576 | consumed samples:        96720 | elapsed time per iteration (ms): 15439.1 | learning rate: 2.678E-05 | global batch size:    48 | lm loss: 6.460698E+00 | loss scale: 32768.0 | grad norm: 243887.532 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4477/  159576 | consumed samples:        96768 | elapsed time per iteration (ms): 15842.8 | learning rate: 2.680E-05 | global batch size:    48 | lm loss: 6.425824E+00 | loss scale: 32768.0 | grad norm: 194209.566 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4478/  159576 | consumed samples:        96816 | elapsed time per iteration (ms): 15527.8 | learning rate: 2.681E-05 | global batch size:    48 | lm loss: 6.499928E+00 | loss scale: 32768.0 | grad norm: 205164.907 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4479/  159576 | consumed samples:        96864 | elapsed time per iteration (ms): 15497.3 | learning rate: 2.682E-05 | global batch size:    48 | lm loss: 6.333491E+00 | loss scale: 32768.0 | grad norm: 198136.402 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4480/  159576 | consumed samples:        96912 | elapsed time per iteration (ms): 15608.5 | learning rate: 2.684E-05 | global batch size:    48 | lm loss: 6.393649E+00 | loss scale: 32768.0 | grad norm: 226765.459 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4481/  159576 | consumed samples:        96960 | elapsed time per iteration (ms): 15886.4 | learning rate: 2.685E-05 | global batch size:    48 | lm loss: 6.315465E+00 | loss scale: 32768.0 | grad norm: 233990.065 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4482/  159576 | consumed samples:        97008 | elapsed time per iteration (ms): 15388.4 | learning rate: 2.686E-05 | global batch size:    48 | lm loss: 6.467194E+00 | loss scale: 32768.0 | grad norm: 253595.606 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4483/  159576 | consumed samples:        97056 | elapsed time per iteration (ms): 15452.6 | learning rate: 2.688E-05 | global batch size:    48 | lm loss: 6.424766E+00 | loss scale: 32768.0 | grad norm: 243792.882 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4484/  159576 | consumed samples:        97104 | elapsed time per iteration (ms): 15440.8 | learning rate: 2.689E-05 | global batch size:    48 | lm loss: 6.382202E+00 | loss scale: 32768.0 | grad norm: 253619.641 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4485/  159576 | consumed samples:        97152 | elapsed time per iteration (ms): 15758.4 | learning rate: 2.690E-05 | global batch size:    48 | lm loss: 6.420368E+00 | loss scale: 32768.0 | grad norm: 270122.233 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4486/  159576 | consumed samples:        97200 | elapsed time per iteration (ms): 15504.2 | learning rate: 2.692E-05 | global batch size:    48 | lm loss: 6.341059E+00 | loss scale: 32768.0 | grad norm: 264076.223 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4487/  159576 | consumed samples:        97248 | elapsed time per iteration (ms): 15564.4 | learning rate: 2.693E-05 | global batch size:    48 | lm loss: 6.351835E+00 | loss scale: 32768.0 | grad norm: 254803.371 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4488/  159576 | consumed samples:        97296 | elapsed time per iteration (ms): 15603.6 | learning rate: 2.694E-05 | global batch size:    48 | lm loss: 6.344017E+00 | loss scale: 32768.0 | grad norm: 244790.218 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4489/  159576 | consumed samples:        97344 | elapsed time per iteration (ms): 15804.2 | learning rate: 2.696E-05 | global batch size:    48 | lm loss: 6.487484E+00 | loss scale: 32768.0 | grad norm: 242539.962 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4490/  159576 | consumed samples:        97392 | elapsed time per iteration (ms): 15547.3 | learning rate: 2.697E-05 | global batch size:    48 | lm loss: 6.339984E+00 | loss scale: 32768.0 | grad norm: 225575.703 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4491/  159576 | consumed samples:        97440 | elapsed time per iteration (ms): 15475.7 | learning rate: 2.698E-05 | global batch size:    48 | lm loss: 6.449341E+00 | loss scale: 32768.0 | grad norm: 205395.664 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4492/  159576 | consumed samples:        97488 | elapsed time per iteration (ms): 15436.0 | learning rate: 2.700E-05 | global batch size:    48 | lm loss: 6.382250E+00 | loss scale: 32768.0 | grad norm: 234078.700 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4493/  159576 | consumed samples:        97536 | elapsed time per iteration (ms): 15764.8 | learning rate: 2.701E-05 | global batch size:    48 | lm loss: 6.425200E+00 | loss scale: 32768.0 | grad norm: 247476.491 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4494/  159576 | consumed samples:        97584 | elapsed time per iteration (ms): 15532.5 | learning rate: 2.702E-05 | global batch size:    48 | lm loss: 6.381852E+00 | loss scale: 32768.0 | grad norm: 242648.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4495/  159576 | consumed samples:        97632 | elapsed time per iteration (ms): 15533.1 | learning rate: 2.704E-05 | global batch size:    48 | lm loss: 6.230868E+00 | loss scale: 32768.0 | grad norm: 219731.182 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4496/  159576 | consumed samples:        97680 | elapsed time per iteration (ms): 15535.3 | learning rate: 2.705E-05 | global batch size:    48 | lm loss: 6.353293E+00 | loss scale: 32768.0 | grad norm: 216013.670 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4497/  159576 | consumed samples:        97728 | elapsed time per iteration (ms): 15701.4 | learning rate: 2.706E-05 | global batch size:    48 | lm loss: 6.307485E+00 | loss scale: 32768.0 | grad norm: 253204.960 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4498/  159576 | consumed samples:        97776 | elapsed time per iteration (ms): 15494.7 | learning rate: 2.708E-05 | global batch size:    48 | lm loss: 6.438371E+00 | loss scale: 32768.0 | grad norm: 214787.319 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4499/  159576 | consumed samples:        97824 | elapsed time per iteration (ms): 15439.3 | learning rate: 2.709E-05 | global batch size:    48 | lm loss: 6.311467E+00 | loss scale: 32768.0 | grad norm: 216077.744 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4500/  159576 | consumed samples:        97872 | elapsed time per iteration (ms): 15685.9 | learning rate: 2.710E-05 | global batch size:    48 | lm loss: 6.423208E+00 | loss scale: 32768.0 | grad norm: 207994.297 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
saving checkpoint at iteration    4500 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints
[2021-09-24 19:52:27,219] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/global_step4500/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration    4500 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints
time (ms) | save-checkpoint: 17750.13
 iteration     4501/  159576 | consumed samples:        97920 | elapsed time per iteration (ms): 33282.4 | learning rate: 2.712E-05 | global batch size:    48 | lm loss: 6.402827E+00 | loss scale: 32768.0 | grad norm: 259915.027 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4502/  159576 | consumed samples:        97968 | elapsed time per iteration (ms): 15581.1 | learning rate: 2.713E-05 | global batch size:    48 | lm loss: 6.310410E+00 | loss scale: 32768.0 | grad norm: 222384.313 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4503/  159576 | consumed samples:        98016 | elapsed time per iteration (ms): 15856.7 | learning rate: 2.714E-05 | global batch size:    48 | lm loss: 6.259107E+00 | loss scale: 32768.0 | grad norm: 219981.429 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4504/  159576 | consumed samples:        98064 | elapsed time per iteration (ms): 15522.8 | learning rate: 2.716E-05 | global batch size:    48 | lm loss: 6.441791E+00 | loss scale: 32768.0 | grad norm: 235487.992 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4505/  159576 | consumed samples:        98112 | elapsed time per iteration (ms): 15475.3 | learning rate: 2.717E-05 | global batch size:    48 | lm loss: 6.431644E+00 | loss scale: 32768.0 | grad norm: 308152.550 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4506/  159576 | consumed samples:        98160 | elapsed time per iteration (ms): 15475.2 | learning rate: 2.718E-05 | global batch size:    48 | lm loss: 6.437158E+00 | loss scale: 32768.0 | grad norm: 223087.933 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4507/  159576 | consumed samples:        98208 | elapsed time per iteration (ms): 15919.3 | learning rate: 2.720E-05 | global batch size:    48 | lm loss: 6.456445E+00 | loss scale: 32768.0 | grad norm: 223422.565 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4508/  159576 | consumed samples:        98256 | elapsed time per iteration (ms): 15503.1 | learning rate: 2.721E-05 | global batch size:    48 | lm loss: 6.409997E+00 | loss scale: 32768.0 | grad norm: 245785.630 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4509/  159576 | consumed samples:        98304 | elapsed time per iteration (ms): 15512.1 | learning rate: 2.722E-05 | global batch size:    48 | lm loss: 6.441339E+00 | loss scale: 32768.0 | grad norm: 283619.839 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4510/  159576 | consumed samples:        98352 | elapsed time per iteration (ms): 15548.0 | learning rate: 2.724E-05 | global batch size:    48 | lm loss: 6.441983E+00 | loss scale: 32768.0 | grad norm: 235037.042 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4511/  159576 | consumed samples:        98400 | elapsed time per iteration (ms): 15735.6 | learning rate: 2.725E-05 | global batch size:    48 | lm loss: 6.499406E+00 | loss scale: 32768.0 | grad norm: 238925.774 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4512/  159576 | consumed samples:        98448 | elapsed time per iteration (ms): 15495.6 | learning rate: 2.726E-05 | global batch size:    48 | lm loss: 6.429494E+00 | loss scale: 32768.0 | grad norm: 295604.429 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4513/  159576 | consumed samples:        98496 | elapsed time per iteration (ms): 15481.9 | learning rate: 2.728E-05 | global batch size:    48 | lm loss: 6.407839E+00 | loss scale: 32768.0 | grad norm: 292842.531 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4514/  159576 | consumed samples:        98544 | elapsed time per iteration (ms): 15479.3 | learning rate: 2.729E-05 | global batch size:    48 | lm loss: 6.440022E+00 | loss scale: 32768.0 | grad norm: 270315.805 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4515/  159576 | consumed samples:        98592 | elapsed time per iteration (ms): 15606.8 | learning rate: 2.730E-05 | global batch size:    48 | lm loss: 6.391658E+00 | loss scale: 32768.0 | grad norm: 271519.155 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4516/  159576 | consumed samples:        98640 | elapsed time per iteration (ms): 15492.8 | learning rate: 2.732E-05 | global batch size:    48 | lm loss: 6.445361E+00 | loss scale: 32768.0 | grad norm: 235853.751 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4517/  159576 | consumed samples:        98688 | elapsed time per iteration (ms): 15525.5 | learning rate: 2.733E-05 | global batch size:    48 | lm loss: 6.274318E+00 | loss scale: 32768.0 | grad norm: 246250.889 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4518/  159576 | consumed samples:        98736 | elapsed time per iteration (ms): 15595.2 | learning rate: 2.734E-05 | global batch size:    48 | lm loss: 6.378585E+00 | loss scale: 32768.0 | grad norm: 262163.945 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4519/  159576 | consumed samples:        98784 | elapsed time per iteration (ms): 15657.4 | learning rate: 2.736E-05 | global batch size:    48 | lm loss: 6.398365E+00 | loss scale: 32768.0 | grad norm: 339087.705 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4520/  159576 | consumed samples:        98832 | elapsed time per iteration (ms): 15503.5 | learning rate: 2.737E-05 | global batch size:    48 | lm loss: 6.435692E+00 | loss scale: 32768.0 | grad norm: 219944.197 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4521/  159576 | consumed samples:        98880 | elapsed time per iteration (ms): 15444.3 | learning rate: 2.738E-05 | global batch size:    48 | lm loss: 6.418158E+00 | loss scale: 32768.0 | grad norm: 295809.324 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4522/  159576 | consumed samples:        98928 | elapsed time per iteration (ms): 15726.5 | learning rate: 2.739E-05 | global batch size:    48 | lm loss: 6.317287E+00 | loss scale: 32768.0 | grad norm: 256139.821 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4523/  159576 | consumed samples:        98976 | elapsed time per iteration (ms): 15697.5 | learning rate: 2.741E-05 | global batch size:    48 | lm loss: 6.210083E+00 | loss scale: 32768.0 | grad norm: 222390.085 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4524/  159576 | consumed samples:        99024 | elapsed time per iteration (ms): 15483.9 | learning rate: 2.742E-05 | global batch size:    48 | lm loss: 6.357608E+00 | loss scale: 32768.0 | grad norm: 250631.340 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4525/  159576 | consumed samples:        99072 | elapsed time per iteration (ms): 15498.9 | learning rate: 2.743E-05 | global batch size:    48 | lm loss: 6.439158E+00 | loss scale: 32768.0 | grad norm: 237183.590 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4526/  159576 | consumed samples:        99120 | elapsed time per iteration (ms): 15870.3 | learning rate: 2.745E-05 | global batch size:    48 | lm loss: 6.477302E+00 | loss scale: 32768.0 | grad norm: 234590.425 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4527/  159576 | consumed samples:        99168 | elapsed time per iteration (ms): 15527.5 | learning rate: 2.746E-05 | global batch size:    48 | lm loss: 6.404512E+00 | loss scale: 32768.0 | grad norm: 268737.102 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4528/  159576 | consumed samples:        99216 | elapsed time per iteration (ms): 15477.7 | learning rate: 2.747E-05 | global batch size:    48 | lm loss: 6.357052E+00 | loss scale: 32768.0 | grad norm: 199055.934 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4529/  159576 | consumed samples:        99264 | elapsed time per iteration (ms): 15441.0 | learning rate: 2.749E-05 | global batch size:    48 | lm loss: 6.418729E+00 | loss scale: 32768.0 | grad norm: 280337.259 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4530/  159576 | consumed samples:        99312 | elapsed time per iteration (ms): 15870.6 | learning rate: 2.750E-05 | global batch size:    48 | lm loss: 6.394526E+00 | loss scale: 32768.0 | grad norm: 242159.812 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4531/  159576 | consumed samples:        99360 | elapsed time per iteration (ms): 15356.1 | learning rate: 2.751E-05 | global batch size:    48 | lm loss: 6.454551E+00 | loss scale: 32768.0 | grad norm: 238356.429 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4532/  159576 | consumed samples:        99408 | elapsed time per iteration (ms): 15481.2 | learning rate: 2.753E-05 | global batch size:    48 | lm loss: 6.479828E+00 | loss scale: 32768.0 | grad norm: 256781.681 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4533/  159576 | consumed samples:        99456 | elapsed time per iteration (ms): 15512.7 | learning rate: 2.754E-05 | global batch size:    48 | lm loss: 6.347847E+00 | loss scale: 32768.0 | grad norm: 232593.280 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4534/  159576 | consumed samples:        99504 | elapsed time per iteration (ms): 16020.6 | learning rate: 2.755E-05 | global batch size:    48 | lm loss: 6.361287E+00 | loss scale: 32768.0 | grad norm: 214859.706 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4535/  159576 | consumed samples:        99552 | elapsed time per iteration (ms): 15687.2 | learning rate: 2.757E-05 | global batch size:    48 | lm loss: 6.344873E+00 | loss scale: 32768.0 | grad norm: 214653.297 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4536/  159576 | consumed samples:        99600 | elapsed time per iteration (ms): 15424.3 | learning rate: 2.758E-05 | global batch size:    48 | lm loss: 6.273855E+00 | loss scale: 32768.0 | grad norm: 249309.228 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4537/  159576 | consumed samples:        99648 | elapsed time per iteration (ms): 15440.3 | learning rate: 2.759E-05 | global batch size:    48 | lm loss: 6.373835E+00 | loss scale: 32768.0 | grad norm: 230963.275 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4538/  159576 | consumed samples:        99696 | elapsed time per iteration (ms): 15788.5 | learning rate: 2.761E-05 | global batch size:    48 | lm loss: 6.381639E+00 | loss scale: 32768.0 | grad norm: 258586.304 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4539/  159576 | consumed samples:        99744 | elapsed time per iteration (ms): 15436.7 | learning rate: 2.762E-05 | global batch size:    48 | lm loss: 6.464207E+00 | loss scale: 32768.0 | grad norm: 260715.522 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4540/  159576 | consumed samples:        99792 | elapsed time per iteration (ms): 15631.9 | learning rate: 2.763E-05 | global batch size:    48 | lm loss: 6.282461E+00 | loss scale: 32768.0 | grad norm: 271394.559 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4541/  159576 | consumed samples:        99840 | elapsed time per iteration (ms): 15417.1 | learning rate: 2.765E-05 | global batch size:    48 | lm loss: 6.323977E+00 | loss scale: 32768.0 | grad norm: 268740.684 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4542/  159576 | consumed samples:        99888 | elapsed time per iteration (ms): 15726.7 | learning rate: 2.766E-05 | global batch size:    48 | lm loss: 6.419955E+00 | loss scale: 32768.0 | grad norm: 270171.155 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4543/  159576 | consumed samples:        99936 | elapsed time per iteration (ms): 15524.6 | learning rate: 2.767E-05 | global batch size:    48 | lm loss: 6.456992E+00 | loss scale: 32768.0 | grad norm: 255182.014 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4544/  159576 | consumed samples:        99984 | elapsed time per iteration (ms): 15442.0 | learning rate: 2.769E-05 | global batch size:    48 | lm loss: 6.327838E+00 | loss scale: 32768.0 | grad norm: 224129.919 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4545/  159576 | consumed samples:       100032 | elapsed time per iteration (ms): 15419.1 | learning rate: 2.770E-05 | global batch size:    48 | lm loss: 6.374109E+00 | loss scale: 32768.0 | grad norm: 265872.290 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4546/  159576 | consumed samples:       100080 | elapsed time per iteration (ms): 15626.3 | learning rate: 2.771E-05 | global batch size:    48 | lm loss: 6.332025E+00 | loss scale: 32768.0 | grad norm: 221965.501 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4547/  159576 | consumed samples:       100128 | elapsed time per iteration (ms): 15454.8 | learning rate: 2.773E-05 | global batch size:    48 | lm loss: 6.399364E+00 | loss scale: 32768.0 | grad norm: 257839.194 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4548/  159576 | consumed samples:       100176 | elapsed time per iteration (ms): 15431.4 | learning rate: 2.774E-05 | global batch size:    48 | lm loss: 6.411947E+00 | loss scale: 32768.0 | grad norm: 278135.374 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4549/  159576 | consumed samples:       100224 | elapsed time per iteration (ms): 15844.6 | learning rate: 2.775E-05 | global batch size:    48 | lm loss: 6.477700E+00 | loss scale: 32768.0 | grad norm: 277855.734 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4550/  159576 | consumed samples:       100272 | elapsed time per iteration (ms): 15537.3 | learning rate: 2.777E-05 | global batch size:    48 | lm loss: 6.526390E+00 | loss scale: 32768.0 | grad norm: 246063.278 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4551/  159576 | consumed samples:       100320 | elapsed time per iteration (ms): 15431.5 | learning rate: 2.778E-05 | global batch size:    48 | lm loss: 6.391055E+00 | loss scale: 32768.0 | grad norm: 230174.666 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4552/  159576 | consumed samples:       100368 | elapsed time per iteration (ms): 15392.1 | learning rate: 2.779E-05 | global batch size:    48 | lm loss: 6.381279E+00 | loss scale: 32768.0 | grad norm: 230427.324 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4553/  159576 | consumed samples:       100416 | elapsed time per iteration (ms): 15770.9 | learning rate: 2.781E-05 | global batch size:    48 | lm loss: 6.438869E+00 | loss scale: 32768.0 | grad norm: 230488.615 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4554/  159576 | consumed samples:       100464 | elapsed time per iteration (ms): 15447.7 | learning rate: 2.782E-05 | global batch size:    48 | lm loss: 6.245214E+00 | loss scale: 32768.0 | grad norm: 277295.687 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4555/  159576 | consumed samples:       100512 | elapsed time per iteration (ms): 15446.9 | learning rate: 2.783E-05 | global batch size:    48 | lm loss: 6.413427E+00 | loss scale: 32768.0 | grad norm: 223183.539 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-24 20:07:07] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1165978_[1-10%1] on 'gpu_p13' partition)
[2021-09-24 20:07:07] PULSE: tr8-104B is running for 14:14:56 since 2021-09-24T05:52:11 (1162855_1 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8])
 iteration     4556/  159576 | consumed samples:       100560 | elapsed time per iteration (ms): 15400.2 | learning rate: 2.785E-05 | global batch size:    48 | lm loss: 6.398170E+00 | loss scale: 32768.0 | grad norm: 233778.721 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4557/  159576 | consumed samples:       100608 | elapsed time per iteration (ms): 15788.3 | learning rate: 2.786E-05 | global batch size:    48 | lm loss: 6.417650E+00 | loss scale: 32768.0 | grad norm: 311870.109 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4558/  159576 | consumed samples:       100656 | elapsed time per iteration (ms): 15428.6 | learning rate: 2.787E-05 | global batch size:    48 | lm loss: 6.394480E+00 | loss scale: 32768.0 | grad norm: 234331.495 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4559/  159576 | consumed samples:       100704 | elapsed time per iteration (ms): 15432.2 | learning rate: 2.789E-05 | global batch size:    48 | lm loss: 6.379920E+00 | loss scale: 32768.0 | grad norm: 256774.134 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4560/  159576 | consumed samples:       100752 | elapsed time per iteration (ms): 15427.3 | learning rate: 2.790E-05 | global batch size:    48 | lm loss: 6.398593E+00 | loss scale: 32768.0 | grad norm: 244274.326 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4561/  159576 | consumed samples:       100800 | elapsed time per iteration (ms): 15906.6 | learning rate: 2.791E-05 | global batch size:    48 | lm loss: 6.370606E+00 | loss scale: 32768.0 | grad norm: 239881.224 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4562/  159576 | consumed samples:       100848 | elapsed time per iteration (ms): 15436.7 | learning rate: 2.793E-05 | global batch size:    48 | lm loss: 6.449897E+00 | loss scale: 32768.0 | grad norm: 244189.290 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4563/  159576 | consumed samples:       100896 | elapsed time per iteration (ms): 15423.9 | learning rate: 2.794E-05 | global batch size:    48 | lm loss: 6.361297E+00 | loss scale: 32768.0 | grad norm: 214769.520 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4564/  159576 | consumed samples:       100944 | elapsed time per iteration (ms): 15485.4 | learning rate: 2.795E-05 | global batch size:    48 | lm loss: 6.315623E+00 | loss scale: 32768.0 | grad norm: 238075.723 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4565/  159576 | consumed samples:       100992 | elapsed time per iteration (ms): 15712.7 | learning rate: 2.797E-05 | global batch size:    48 | lm loss: 6.407779E+00 | loss scale: 32768.0 | grad norm: 219946.422 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4566/  159576 | consumed samples:       101040 | elapsed time per iteration (ms): 15450.4 | learning rate: 2.798E-05 | global batch size:    48 | lm loss: 6.417436E+00 | loss scale: 32768.0 | grad norm: 240930.366 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4567/  159576 | consumed samples:       101088 | elapsed time per iteration (ms): 15429.7 | learning rate: 2.799E-05 | global batch size:    48 | lm loss: 6.436010E+00 | loss scale: 32768.0 | grad norm: 314077.087 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4568/  159576 | consumed samples:       101136 | elapsed time per iteration (ms): 15422.9 | learning rate: 2.801E-05 | global batch size:    48 | lm loss: 6.520737E+00 | loss scale: 32768.0 | grad norm: 274297.002 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4569/  159576 | consumed samples:       101184 | elapsed time per iteration (ms): 15586.4 | learning rate: 2.802E-05 | global batch size:    48 | lm loss: 6.416994E+00 | loss scale: 32768.0 | grad norm: 231703.132 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4570/  159576 | consumed samples:       101232 | elapsed time per iteration (ms): 15422.0 | learning rate: 2.803E-05 | global batch size:    48 | lm loss: 6.319811E+00 | loss scale: 32768.0 | grad norm: 231530.726 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4571/  159576 | consumed samples:       101280 | elapsed time per iteration (ms): 15338.3 | learning rate: 2.805E-05 | global batch size:    48 | lm loss: 6.400026E+00 | loss scale: 32768.0 | grad norm: 257733.850 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4572/  159576 | consumed samples:       101328 | elapsed time per iteration (ms): 15446.6 | learning rate: 2.806E-05 | global batch size:    48 | lm loss: 6.435762E+00 | loss scale: 32768.0 | grad norm: 268511.480 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4573/  159576 | consumed samples:       101376 | elapsed time per iteration (ms): 15589.8 | learning rate: 2.807E-05 | global batch size:    48 | lm loss: 6.406414E+00 | loss scale: 32768.0 | grad norm: 233768.669 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4574/  159576 | consumed samples:       101424 | elapsed time per iteration (ms): 15349.3 | learning rate: 2.809E-05 | global batch size:    48 | lm loss: 6.437346E+00 | loss scale: 32768.0 | grad norm: 269214.009 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4575/  159576 | consumed samples:       101472 | elapsed time per iteration (ms): 15388.4 | learning rate: 2.810E-05 | global batch size:    48 | lm loss: 6.352981E+00 | loss scale: 32768.0 | grad norm: 243418.743 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4576/  159576 | consumed samples:       101520 | elapsed time per iteration (ms): 15469.0 | learning rate: 2.811E-05 | global batch size:    48 | lm loss: 6.355519E+00 | loss scale: 32768.0 | grad norm: 255521.793 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4577/  159576 | consumed samples:       101568 | elapsed time per iteration (ms): 15986.1 | learning rate: 2.813E-05 | global batch size:    48 | lm loss: 6.380365E+00 | loss scale: 32768.0 | grad norm: 263123.213 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4578/  159576 | consumed samples:       101616 | elapsed time per iteration (ms): 15483.5 | learning rate: 2.814E-05 | global batch size:    48 | lm loss: 6.442792E+00 | loss scale: 32768.0 | grad norm: 264664.009 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4579/  159576 | consumed samples:       101664 | elapsed time per iteration (ms): 15482.0 | learning rate: 2.815E-05 | global batch size:    48 | lm loss: 6.300795E+00 | loss scale: 32768.0 | grad norm: 263093.923 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4580/  159576 | consumed samples:       101712 | elapsed time per iteration (ms): 15915.5 | learning rate: 2.817E-05 | global batch size:    48 | lm loss: 6.509340E+00 | loss scale: 32768.0 | grad norm: 325066.014 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4581/  159576 | consumed samples:       101760 | elapsed time per iteration (ms): 15478.8 | learning rate: 2.818E-05 | global batch size:    48 | lm loss: 6.417569E+00 | loss scale: 32768.0 | grad norm: 317932.491 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4582/  159576 | consumed samples:       101808 | elapsed time per iteration (ms): 15467.6 | learning rate: 2.819E-05 | global batch size:    48 | lm loss: 6.391977E+00 | loss scale: 32768.0 | grad norm: 265433.359 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4583/  159576 | consumed samples:       101856 | elapsed time per iteration (ms): 15463.2 | learning rate: 2.821E-05 | global batch size:    48 | lm loss: 6.493138E+00 | loss scale: 32768.0 | grad norm: 262301.719 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4584/  159576 | consumed samples:       101904 | elapsed time per iteration (ms): 15787.5 | learning rate: 2.822E-05 | global batch size:    48 | lm loss: 6.358137E+00 | loss scale: 32768.0 | grad norm: 302003.298 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4585/  159576 | consumed samples:       101952 | elapsed time per iteration (ms): 15486.8 | learning rate: 2.823E-05 | global batch size:    48 | lm loss: 6.398649E+00 | loss scale: 32768.0 | grad norm: 241427.078 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4586/  159576 | consumed samples:       102000 | elapsed time per iteration (ms): 15502.1 | learning rate: 2.825E-05 | global batch size:    48 | lm loss: 6.450002E+00 | loss scale: 32768.0 | grad norm: 288231.307 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4587/  159576 | consumed samples:       102048 | elapsed time per iteration (ms): 15613.4 | learning rate: 2.826E-05 | global batch size:    48 | lm loss: 6.463566E+00 | loss scale: 32768.0 | grad norm: 255700.156 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4588/  159576 | consumed samples:       102096 | elapsed time per iteration (ms): 16100.7 | learning rate: 2.827E-05 | global batch size:    48 | lm loss: 6.440113E+00 | loss scale: 32768.0 | grad norm: 228589.163 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4589/  159576 | consumed samples:       102144 | elapsed time per iteration (ms): 15550.6 | learning rate: 2.829E-05 | global batch size:    48 | lm loss: 6.330764E+00 | loss scale: 32768.0 | grad norm: 253562.437 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4590/  159576 | consumed samples:       102192 | elapsed time per iteration (ms): 15504.0 | learning rate: 2.830E-05 | global batch size:    48 | lm loss: 6.565317E+00 | loss scale: 32768.0 | grad norm: 248109.457 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4591/  159576 | consumed samples:       102240 | elapsed time per iteration (ms): 15500.8 | learning rate: 2.831E-05 | global batch size:    48 | lm loss: 6.432470E+00 | loss scale: 32768.0 | grad norm: 258408.480 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4592/  159576 | consumed samples:       102288 | elapsed time per iteration (ms): 15682.0 | learning rate: 2.833E-05 | global batch size:    48 | lm loss: 6.388723E+00 | loss scale: 32768.0 | grad norm: 255460.696 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4593/  159576 | consumed samples:       102336 | elapsed time per iteration (ms): 15624.8 | learning rate: 2.834E-05 | global batch size:    48 | lm loss: 6.252523E+00 | loss scale: 32768.0 | grad norm: 247063.847 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4594/  159576 | consumed samples:       102384 | elapsed time per iteration (ms): 15619.9 | learning rate: 2.835E-05 | global batch size:    48 | lm loss: 6.256584E+00 | loss scale: 32768.0 | grad norm: 252094.746 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4595/  159576 | consumed samples:       102432 | elapsed time per iteration (ms): 15618.3 | learning rate: 2.837E-05 | global batch size:    48 | lm loss: 6.422144E+00 | loss scale: 32768.0 | grad norm: 327415.393 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4596/  159576 | consumed samples:       102480 | elapsed time per iteration (ms): 15731.1 | learning rate: 2.838E-05 | global batch size:    48 | lm loss: 6.362859E+00 | loss scale: 32768.0 | grad norm: 271628.783 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4597/  159576 | consumed samples:       102528 | elapsed time per iteration (ms): 15470.5 | learning rate: 2.839E-05 | global batch size:    48 | lm loss: 6.400634E+00 | loss scale: 32768.0 | grad norm: 270235.866 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4598/  159576 | consumed samples:       102576 | elapsed time per iteration (ms): 15494.8 | learning rate: 2.841E-05 | global batch size:    48 | lm loss: 6.409593E+00 | loss scale: 32768.0 | grad norm: 246051.964 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4599/  159576 | consumed samples:       102624 | elapsed time per iteration (ms): 15503.4 | learning rate: 2.842E-05 | global batch size:    48 | lm loss: 6.286301E+00 | loss scale: 32768.0 | grad norm: 315951.056 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4600/  159576 | consumed samples:       102672 | elapsed time per iteration (ms): 15657.8 | learning rate: 2.843E-05 | global batch size:    48 | lm loss: 6.424391E+00 | loss scale: 32768.0 | grad norm: 257970.239 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4601/  159576 | consumed samples:       102720 | elapsed time per iteration (ms): 15415.9 | learning rate: 2.845E-05 | global batch size:    48 | lm loss: 6.419086E+00 | loss scale: 32768.0 | grad norm: 232614.820 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4602/  159576 | consumed samples:       102768 | elapsed time per iteration (ms): 15506.4 | learning rate: 2.846E-05 | global batch size:    48 | lm loss: 6.598701E+00 | loss scale: 32768.0 | grad norm: 269465.797 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4603/  159576 | consumed samples:       102816 | elapsed time per iteration (ms): 15842.0 | learning rate: 2.847E-05 | global batch size:    48 | lm loss: 6.374152E+00 | loss scale: 32768.0 | grad norm: 256871.390 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4604/  159576 | consumed samples:       102864 | elapsed time per iteration (ms): 15661.0 | learning rate: 2.849E-05 | global batch size:    48 | lm loss: 6.330672E+00 | loss scale: 32768.0 | grad norm: 261276.305 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4605/  159576 | consumed samples:       102912 | elapsed time per iteration (ms): 15453.1 | learning rate: 2.850E-05 | global batch size:    48 | lm loss: 6.409989E+00 | loss scale: 32768.0 | grad norm: 213427.896 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4606/  159576 | consumed samples:       102960 | elapsed time per iteration (ms): 15529.1 | learning rate: 2.851E-05 | global batch size:    48 | lm loss: 6.409967E+00 | loss scale: 32768.0 | grad norm: 343079.843 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4607/  159576 | consumed samples:       103008 | elapsed time per iteration (ms): 15784.9 | learning rate: 2.853E-05 | global batch size:    48 | lm loss: 6.345381E+00 | loss scale: 32768.0 | grad norm: 288014.524 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4608/  159576 | consumed samples:       103056 | elapsed time per iteration (ms): 15407.4 | learning rate: 2.854E-05 | global batch size:    48 | lm loss: 6.160167E+00 | loss scale: 32768.0 | grad norm: 236948.790 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4609/  159576 | consumed samples:       103104 | elapsed time per iteration (ms): 15521.9 | learning rate: 2.855E-05 | global batch size:    48 | lm loss: 6.368454E+00 | loss scale: 32768.0 | grad norm: 346716.620 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4610/  159576 | consumed samples:       103152 | elapsed time per iteration (ms): 15546.6 | learning rate: 2.857E-05 | global batch size:    48 | lm loss: 6.485950E+00 | loss scale: 32768.0 | grad norm: 249193.625 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4611/  159576 | consumed samples:       103200 | elapsed time per iteration (ms): 15842.5 | learning rate: 2.858E-05 | global batch size:    48 | lm loss: 6.433112E+00 | loss scale: 32768.0 | grad norm: 245691.542 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4612/  159576 | consumed samples:       103248 | elapsed time per iteration (ms): 15452.2 | learning rate: 2.859E-05 | global batch size:    48 | lm loss: 6.453573E+00 | loss scale: 32768.0 | grad norm: 326844.652 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4613/  159576 | consumed samples:       103296 | elapsed time per iteration (ms): 15454.7 | learning rate: 2.861E-05 | global batch size:    48 | lm loss: 6.431165E+00 | loss scale: 32768.0 | grad norm: 289334.369 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4614/  159576 | consumed samples:       103344 | elapsed time per iteration (ms): 15458.5 | learning rate: 2.862E-05 | global batch size:    48 | lm loss: 6.229577E+00 | loss scale: 32768.0 | grad norm: 256574.569 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4615/  159576 | consumed samples:       103392 | elapsed time per iteration (ms): 15900.6 | learning rate: 2.863E-05 | global batch size:    48 | lm loss: 6.432065E+00 | loss scale: 32768.0 | grad norm: 273324.041 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4616/  159576 | consumed samples:       103440 | elapsed time per iteration (ms): 15568.2 | learning rate: 2.865E-05 | global batch size:    48 | lm loss: 6.373868E+00 | loss scale: 32768.0 | grad norm: 289471.232 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4617/  159576 | consumed samples:       103488 | elapsed time per iteration (ms): 15491.7 | learning rate: 2.866E-05 | global batch size:    48 | lm loss: 6.302549E+00 | loss scale: 32768.0 | grad norm: 421148.983 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4618/  159576 | consumed samples:       103536 | elapsed time per iteration (ms): 15549.9 | learning rate: 2.867E-05 | global batch size:    48 | lm loss: 6.278319E+00 | loss scale: 32768.0 | grad norm: 346570.622 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4619/  159576 | consumed samples:       103584 | elapsed time per iteration (ms): 15749.4 | learning rate: 2.869E-05 | global batch size:    48 | lm loss: 6.394638E+00 | loss scale: 32768.0 | grad norm: 356110.872 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4620/  159576 | consumed samples:       103632 | elapsed time per iteration (ms): 15472.2 | learning rate: 2.870E-05 | global batch size:    48 | lm loss: 6.303448E+00 | loss scale: 32768.0 | grad norm: 328724.972 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4621/  159576 | consumed samples:       103680 | elapsed time per iteration (ms): 15427.3 | learning rate: 2.871E-05 | global batch size:    48 | lm loss: 6.544609E+00 | loss scale: 32768.0 | grad norm: 324100.834 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4622/  159576 | consumed samples:       103728 | elapsed time per iteration (ms): 15472.5 | learning rate: 2.873E-05 | global batch size:    48 | lm loss: 6.314513E+00 | loss scale: 32768.0 | grad norm: 275878.819 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4623/  159576 | consumed samples:       103776 | elapsed time per iteration (ms): 15583.2 | learning rate: 2.874E-05 | global batch size:    48 | lm loss: 6.398262E+00 | loss scale: 32768.0 | grad norm: 263126.230 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4624/  159576 | consumed samples:       103824 | elapsed time per iteration (ms): 15483.7 | learning rate: 2.875E-05 | global batch size:    48 | lm loss: 6.474843E+00 | loss scale: 32768.0 | grad norm: 242329.963 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4625/  159576 | consumed samples:       103872 | elapsed time per iteration (ms): 15477.6 | learning rate: 2.877E-05 | global batch size:    48 | lm loss: 6.408014E+00 | loss scale: 32768.0 | grad norm: 267696.261 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4626/  159576 | consumed samples:       103920 | elapsed time per iteration (ms): 15516.2 | learning rate: 2.878E-05 | global batch size:    48 | lm loss: 6.847461E+00 | loss scale: 32768.0 | grad norm: 713094.141 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4627/  159576 | consumed samples:       103968 | elapsed time per iteration (ms): 15724.2 | learning rate: 2.879E-05 | global batch size:    48 | lm loss: 6.386415E+00 | loss scale: 32768.0 | grad norm: 272846.125 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4628/  159576 | consumed samples:       104016 | elapsed time per iteration (ms): 15456.1 | learning rate: 2.881E-05 | global batch size:    48 | lm loss: 6.446278E+00 | loss scale: 32768.0 | grad norm: 379795.778 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4629/  159576 | consumed samples:       104064 | elapsed time per iteration (ms): 15435.5 | learning rate: 2.882E-05 | global batch size:    48 | lm loss: 6.469239E+00 | loss scale: 32768.0 | grad norm: 207715.801 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4630/  159576 | consumed samples:       104112 | elapsed time per iteration (ms): 15698.1 | learning rate: 2.883E-05 | global batch size:    48 | lm loss: 6.357453E+00 | loss scale: 32768.0 | grad norm: 236792.203 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4631/  159576 | consumed samples:       104160 | elapsed time per iteration (ms): 15489.5 | learning rate: 2.885E-05 | global batch size:    48 | lm loss: 6.448473E+00 | loss scale: 32768.0 | grad norm: 225431.411 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4632/  159576 | consumed samples:       104208 | elapsed time per iteration (ms): 15562.5 | learning rate: 2.886E-05 | global batch size:    48 | lm loss: 6.377034E+00 | loss scale: 32768.0 | grad norm: 375353.459 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4633/  159576 | consumed samples:       104256 | elapsed time per iteration (ms): 15569.5 | learning rate: 2.887E-05 | global batch size:    48 | lm loss: 6.516908E+00 | loss scale: 32768.0 | grad norm: 333588.373 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4634/  159576 | consumed samples:       104304 | elapsed time per iteration (ms): 15928.9 | learning rate: 2.889E-05 | global batch size:    48 | lm loss: 6.574339E+00 | loss scale: 32768.0 | grad norm: 243589.856 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4635/  159576 | consumed samples:       104352 | elapsed time per iteration (ms): 15531.5 | learning rate: 2.890E-05 | global batch size:    48 | lm loss: 6.475029E+00 | loss scale: 32768.0 | grad norm: 442923.681 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4636/  159576 | consumed samples:       104400 | elapsed time per iteration (ms): 15560.0 | learning rate: 2.891E-05 | global batch size:    48 | lm loss: 6.369026E+00 | loss scale: 32768.0 | grad norm: 295484.961 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4637/  159576 | consumed samples:       104448 | elapsed time per iteration (ms): 15543.7 | learning rate: 2.893E-05 | global batch size:    48 | lm loss: 6.490546E+00 | loss scale: 32768.0 | grad norm: 279233.122 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4638/  159576 | consumed samples:       104496 | elapsed time per iteration (ms): 15916.4 | learning rate: 2.894E-05 | global batch size:    48 | lm loss: 6.437621E+00 | loss scale: 32768.0 | grad norm: 245214.935 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4639/  159576 | consumed samples:       104544 | elapsed time per iteration (ms): 15547.5 | learning rate: 2.895E-05 | global batch size:    48 | lm loss: 6.491655E+00 | loss scale: 32768.0 | grad norm: 240217.342 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4640/  159576 | consumed samples:       104592 | elapsed time per iteration (ms): 15573.7 | learning rate: 2.897E-05 | global batch size:    48 | lm loss: 6.455505E+00 | loss scale: 32768.0 | grad norm: 317400.165 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4641/  159576 | consumed samples:       104640 | elapsed time per iteration (ms): 15624.7 | learning rate: 2.898E-05 | global batch size:    48 | lm loss: 6.482111E+00 | loss scale: 32768.0 | grad norm: 244102.198 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4642/  159576 | consumed samples:       104688 | elapsed time per iteration (ms): 16106.5 | learning rate: 2.899E-05 | global batch size:    48 | lm loss: 6.281504E+00 | loss scale: 32768.0 | grad norm: 282861.527 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4643/  159576 | consumed samples:       104736 | elapsed time per iteration (ms): 15639.7 | learning rate: 2.901E-05 | global batch size:    48 | lm loss: 6.420715E+00 | loss scale: 32768.0 | grad norm: 274009.202 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4644/  159576 | consumed samples:       104784 | elapsed time per iteration (ms): 15520.7 | learning rate: 2.902E-05 | global batch size:    48 | lm loss: 6.342989E+00 | loss scale: 32768.0 | grad norm: 226933.382 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4645/  159576 | consumed samples:       104832 | elapsed time per iteration (ms): 15501.6 | learning rate: 2.903E-05 | global batch size:    48 | lm loss: 6.427937E+00 | loss scale: 32768.0 | grad norm: 278047.939 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4646/  159576 | consumed samples:       104880 | elapsed time per iteration (ms): 15629.3 | learning rate: 2.905E-05 | global batch size:    48 | lm loss: 6.294481E+00 | loss scale: 32768.0 | grad norm: 235356.190 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4647/  159576 | consumed samples:       104928 | elapsed time per iteration (ms): 15591.9 | learning rate: 2.906E-05 | global batch size:    48 | lm loss: 6.363388E+00 | loss scale: 32768.0 | grad norm: 600293.405 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4648/  159576 | consumed samples:       104976 | elapsed time per iteration (ms): 15595.2 | learning rate: 2.907E-05 | global batch size:    48 | lm loss: 6.377505E+00 | loss scale: 32768.0 | grad norm: 331377.856 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4649/  159576 | consumed samples:       105024 | elapsed time per iteration (ms): 15628.4 | learning rate: 2.909E-05 | global batch size:    48 | lm loss: 6.381812E+00 | loss scale: 32768.0 | grad norm: 200005.238 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4650/  159576 | consumed samples:       105072 | elapsed time per iteration (ms): 15748.7 | learning rate: 2.910E-05 | global batch size:    48 | lm loss: 6.338908E+00 | loss scale: 32768.0 | grad norm: 242913.858 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4651/  159576 | consumed samples:       105120 | elapsed time per iteration (ms): 15511.3 | learning rate: 2.911E-05 | global batch size:    48 | lm loss: 6.419736E+00 | loss scale: 32768.0 | grad norm: 330409.745 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4652/  159576 | consumed samples:       105168 | elapsed time per iteration (ms): 15516.3 | learning rate: 2.913E-05 | global batch size:    48 | lm loss: 6.404620E+00 | loss scale: 32768.0 | grad norm: 318144.336 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4653/  159576 | consumed samples:       105216 | elapsed time per iteration (ms): 15876.3 | learning rate: 2.914E-05 | global batch size:    48 | lm loss: 6.377990E+00 | loss scale: 32768.0 | grad norm: 232202.485 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4654/  159576 | consumed samples:       105264 | elapsed time per iteration (ms): 15718.5 | learning rate: 2.915E-05 | global batch size:    48 | lm loss: 6.383665E+00 | loss scale: 32768.0 | grad norm: 241524.475 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4655/  159576 | consumed samples:       105312 | elapsed time per iteration (ms): 15610.4 | learning rate: 2.917E-05 | global batch size:    48 | lm loss: 6.403493E+00 | loss scale: 32768.0 | grad norm: 373231.364 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4656/  159576 | consumed samples:       105360 | elapsed time per iteration (ms): 15640.8 | learning rate: 2.918E-05 | global batch size:    48 | lm loss: 6.329133E+00 | loss scale: 32768.0 | grad norm: 286954.758 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4657/  159576 | consumed samples:       105408 | elapsed time per iteration (ms): 15996.4 | learning rate: 2.919E-05 | global batch size:    48 | lm loss: 6.748344E+00 | loss scale: 32768.0 | grad norm: 260947.100 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4658/  159576 | consumed samples:       105456 | elapsed time per iteration (ms): 15522.2 | learning rate: 2.921E-05 | global batch size:    48 | lm loss: 6.315388E+00 | loss scale: 32768.0 | grad norm: 279560.800 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4659/  159576 | consumed samples:       105504 | elapsed time per iteration (ms): 15546.8 | learning rate: 2.922E-05 | global batch size:    48 | lm loss: 6.351707E+00 | loss scale: 32768.0 | grad norm: 270238.544 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4660/  159576 | consumed samples:       105552 | elapsed time per iteration (ms): 15483.2 | learning rate: 2.923E-05 | global batch size:    48 | lm loss: 6.338678E+00 | loss scale: 32768.0 | grad norm: 299765.314 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4661/  159576 | consumed samples:       105600 | elapsed time per iteration (ms): 15828.0 | learning rate: 2.925E-05 | global batch size:    48 | lm loss: 6.427124E+00 | loss scale: 32768.0 | grad norm: 302484.019 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4662/  159576 | consumed samples:       105648 | elapsed time per iteration (ms): 15644.1 | learning rate: 2.926E-05 | global batch size:    48 | lm loss: 6.407690E+00 | loss scale: 32768.0 | grad norm: 286169.997 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4663/  159576 | consumed samples:       105696 | elapsed time per iteration (ms): 15583.7 | learning rate: 2.927E-05 | global batch size:    48 | lm loss: 6.254132E+00 | loss scale: 32768.0 | grad norm: 276778.381 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4664/  159576 | consumed samples:       105744 | elapsed time per iteration (ms): 15651.6 | learning rate: 2.929E-05 | global batch size:    48 | lm loss: 6.469905E+00 | loss scale: 32768.0 | grad norm: 279741.368 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4665/  159576 | consumed samples:       105792 | elapsed time per iteration (ms): 15818.3 | learning rate: 2.930E-05 | global batch size:    48 | lm loss: 6.508596E+00 | loss scale: 32768.0 | grad norm: 336670.270 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4666/  159576 | consumed samples:       105840 | elapsed time per iteration (ms): 15552.5 | learning rate: 2.931E-05 | global batch size:    48 | lm loss: 6.434944E+00 | loss scale: 32768.0 | grad norm: 242396.784 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4667/  159576 | consumed samples:       105888 | elapsed time per iteration (ms): 15512.6 | learning rate: 2.933E-05 | global batch size:    48 | lm loss: 6.510550E+00 | loss scale: 32768.0 | grad norm: 252220.315 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4668/  159576 | consumed samples:       105936 | elapsed time per iteration (ms): 15495.7 | learning rate: 2.934E-05 | global batch size:    48 | lm loss: 6.399008E+00 | loss scale: 32768.0 | grad norm: 288495.864 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4669/  159576 | consumed samples:       105984 | elapsed time per iteration (ms): 15668.5 | learning rate: 2.935E-05 | global batch size:    48 | lm loss: 6.404999E+00 | loss scale: 32768.0 | grad norm: 244327.032 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4670/  159576 | consumed samples:       106032 | elapsed time per iteration (ms): 15562.9 | learning rate: 2.937E-05 | global batch size:    48 | lm loss: 6.418772E+00 | loss scale: 32768.0 | grad norm: 313672.915 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4671/  159576 | consumed samples:       106080 | elapsed time per iteration (ms): 15630.7 | learning rate: 2.938E-05 | global batch size:    48 | lm loss: 6.361070E+00 | loss scale: 32768.0 | grad norm: 276763.857 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4672/  159576 | consumed samples:       106128 | elapsed time per iteration (ms): 15597.8 | learning rate: 2.939E-05 | global batch size:    48 | lm loss: 6.477580E+00 | loss scale: 32768.0 | grad norm: 230503.822 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4673/  159576 | consumed samples:       106176 | elapsed time per iteration (ms): 15696.4 | learning rate: 2.941E-05 | global batch size:    48 | lm loss: 6.517149E+00 | loss scale: 32768.0 | grad norm: 217937.765 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4674/  159576 | consumed samples:       106224 | elapsed time per iteration (ms): 15548.7 | learning rate: 2.942E-05 | global batch size:    48 | lm loss: 6.380251E+00 | loss scale: 32768.0 | grad norm: 267703.433 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4675/  159576 | consumed samples:       106272 | elapsed time per iteration (ms): 15515.6 | learning rate: 2.943E-05 | global batch size:    48 | lm loss: 6.348250E+00 | loss scale: 32768.0 | grad norm: 309305.174 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4676/  159576 | consumed samples:       106320 | elapsed time per iteration (ms): 15795.7 | learning rate: 2.945E-05 | global batch size:    48 | lm loss: 6.461040E+00 | loss scale: 32768.0 | grad norm: 285074.708 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4677/  159576 | consumed samples:       106368 | elapsed time per iteration (ms): 15718.4 | learning rate: 2.946E-05 | global batch size:    48 | lm loss: 6.388801E+00 | loss scale: 32768.0 | grad norm: 292644.236 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4678/  159576 | consumed samples:       106416 | elapsed time per iteration (ms): 15585.4 | learning rate: 2.947E-05 | global batch size:    48 | lm loss: 6.417225E+00 | loss scale: 32768.0 | grad norm: 334812.598 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4679/  159576 | consumed samples:       106464 | elapsed time per iteration (ms): 15631.1 | learning rate: 2.949E-05 | global batch size:    48 | lm loss: 6.357790E+00 | loss scale: 32768.0 | grad norm: 301017.925 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4680/  159576 | consumed samples:       106512 | elapsed time per iteration (ms): 15891.7 | learning rate: 2.950E-05 | global batch size:    48 | lm loss: 6.556364E+00 | loss scale: 32768.0 | grad norm: 280065.506 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4681/  159576 | consumed samples:       106560 | elapsed time per iteration (ms): 15562.2 | learning rate: 2.951E-05 | global batch size:    48 | lm loss: 6.393982E+00 | loss scale: 32768.0 | grad norm: 242731.164 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4682/  159576 | consumed samples:       106608 | elapsed time per iteration (ms): 15526.5 | learning rate: 2.953E-05 | global batch size:    48 | lm loss: 6.396220E+00 | loss scale: 32768.0 | grad norm: 407344.753 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4683/  159576 | consumed samples:       106656 | elapsed time per iteration (ms): 15526.3 | learning rate: 2.954E-05 | global batch size:    48 | lm loss: 6.396249E+00 | loss scale: 32768.0 | grad norm: 300342.299 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4684/  159576 | consumed samples:       106704 | elapsed time per iteration (ms): 15885.4 | learning rate: 2.955E-05 | global batch size:    48 | lm loss: 6.375283E+00 | loss scale: 32768.0 | grad norm: 296501.436 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4685/  159576 | consumed samples:       106752 | elapsed time per iteration (ms): 15527.4 | learning rate: 2.957E-05 | global batch size:    48 | lm loss: 6.418046E+00 | loss scale: 32768.0 | grad norm: 290100.249 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4686/  159576 | consumed samples:       106800 | elapsed time per iteration (ms): 15621.1 | learning rate: 2.958E-05 | global batch size:    48 | lm loss: 6.300463E+00 | loss scale: 32768.0 | grad norm: 265814.471 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4687/  159576 | consumed samples:       106848 | elapsed time per iteration (ms): 15592.0 | learning rate: 2.959E-05 | global batch size:    48 | lm loss: 6.440179E+00 | loss scale: 32768.0 | grad norm: 354690.307 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4688/  159576 | consumed samples:       106896 | elapsed time per iteration (ms): 15963.5 | learning rate: 2.961E-05 | global batch size:    48 | lm loss: 6.396194E+00 | loss scale: 32768.0 | grad norm: 259594.010 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4689/  159576 | consumed samples:       106944 | elapsed time per iteration (ms): 15540.2 | learning rate: 2.962E-05 | global batch size:    48 | lm loss: 6.459390E+00 | loss scale: 32768.0 | grad norm: 326661.756 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4690/  159576 | consumed samples:       106992 | elapsed time per iteration (ms): 15512.7 | learning rate: 2.963E-05 | global batch size:    48 | lm loss: 6.324084E+00 | loss scale: 32768.0 | grad norm: 288829.158 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4691/  159576 | consumed samples:       107040 | elapsed time per iteration (ms): 8709.6 | learning rate: 2.963E-05 | global batch size:    48 | lm loss: 6.781525E+00 | loss scale: 16384.0 | grad norm: 288829.158 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4692/  159576 | consumed samples:       107088 | elapsed time per iteration (ms): 15305.7 | learning rate: 2.964E-05 | global batch size:    48 | lm loss: 6.431325E+00 | loss scale: 16384.0 | grad norm: 145022.360 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4693/  159576 | consumed samples:       107136 | elapsed time per iteration (ms): 15550.9 | learning rate: 2.966E-05 | global batch size:    48 | lm loss: 6.516616E+00 | loss scale: 16384.0 | grad norm: 155613.709 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4694/  159576 | consumed samples:       107184 | elapsed time per iteration (ms): 15526.9 | learning rate: 2.967E-05 | global batch size:    48 | lm loss: 6.387960E+00 | loss scale: 16384.0 | grad norm: 134461.471 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4695/  159576 | consumed samples:       107232 | elapsed time per iteration (ms): 15497.0 | learning rate: 2.968E-05 | global batch size:    48 | lm loss: 6.392653E+00 | loss scale: 16384.0 | grad norm: 141822.076 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4696/  159576 | consumed samples:       107280 | elapsed time per iteration (ms): 15923.9 | learning rate: 2.970E-05 | global batch size:    48 | lm loss: 6.412030E+00 | loss scale: 16384.0 | grad norm: 175057.651 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4697/  159576 | consumed samples:       107328 | elapsed time per iteration (ms): 15425.2 | learning rate: 2.971E-05 | global batch size:    48 | lm loss: 6.373864E+00 | loss scale: 16384.0 | grad norm: 282779.549 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4698/  159576 | consumed samples:       107376 | elapsed time per iteration (ms): 15454.6 | learning rate: 2.972E-05 | global batch size:    48 | lm loss: 6.306759E+00 | loss scale: 16384.0 | grad norm: 136700.298 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4699/  159576 | consumed samples:       107424 | elapsed time per iteration (ms): 15528.9 | learning rate: 2.974E-05 | global batch size:    48 | lm loss: 6.335629E+00 | loss scale: 16384.0 | grad norm: 184501.539 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4700/  159576 | consumed samples:       107472 | elapsed time per iteration (ms): 15956.8 | learning rate: 2.975E-05 | global batch size:    48 | lm loss: 6.408161E+00 | loss scale: 16384.0 | grad norm: 173148.921 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4701/  159576 | consumed samples:       107520 | elapsed time per iteration (ms): 15601.2 | learning rate: 2.976E-05 | global batch size:    48 | lm loss: 6.452803E+00 | loss scale: 16384.0 | grad norm: 175212.053 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4702/  159576 | consumed samples:       107568 | elapsed time per iteration (ms): 15499.9 | learning rate: 2.978E-05 | global batch size:    48 | lm loss: 6.444376E+00 | loss scale: 16384.0 | grad norm: 154484.468 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4703/  159576 | consumed samples:       107616 | elapsed time per iteration (ms): 15505.8 | learning rate: 2.979E-05 | global batch size:    48 | lm loss: 6.378032E+00 | loss scale: 16384.0 | grad norm: 157853.641 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4704/  159576 | consumed samples:       107664 | elapsed time per iteration (ms): 15797.2 | learning rate: 2.980E-05 | global batch size:    48 | lm loss: 6.433157E+00 | loss scale: 16384.0 | grad norm: 189038.636 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4705/  159576 | consumed samples:       107712 | elapsed time per iteration (ms): 15428.0 | learning rate: 2.982E-05 | global batch size:    48 | lm loss: 6.345381E+00 | loss scale: 16384.0 | grad norm: 223066.594 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4706/  159576 | consumed samples:       107760 | elapsed time per iteration (ms): 15506.2 | learning rate: 2.983E-05 | global batch size:    48 | lm loss: 6.409193E+00 | loss scale: 16384.0 | grad norm: 138366.342 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4707/  159576 | consumed samples:       107808 | elapsed time per iteration (ms): 15469.9 | learning rate: 2.984E-05 | global batch size:    48 | lm loss: 6.454758E+00 | loss scale: 16384.0 | grad norm: 144072.711 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4708/  159576 | consumed samples:       107856 | elapsed time per iteration (ms): 15711.5 | learning rate: 2.986E-05 | global batch size:    48 | lm loss: 6.418115E+00 | loss scale: 16384.0 | grad norm: 160060.361 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4709/  159576 | consumed samples:       107904 | elapsed time per iteration (ms): 15549.5 | learning rate: 2.987E-05 | global batch size:    48 | lm loss: 6.323099E+00 | loss scale: 16384.0 | grad norm: 158794.827 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4710/  159576 | consumed samples:       107952 | elapsed time per iteration (ms): 15458.0 | learning rate: 2.988E-05 | global batch size:    48 | lm loss: 6.418284E+00 | loss scale: 16384.0 | grad norm: 172985.051 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4711/  159576 | consumed samples:       108000 | elapsed time per iteration (ms): 15477.2 | learning rate: 2.990E-05 | global batch size:    48 | lm loss: 6.449984E+00 | loss scale: 16384.0 | grad norm: 151942.015 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4712/  159576 | consumed samples:       108048 | elapsed time per iteration (ms): 15912.6 | learning rate: 2.991E-05 | global batch size:    48 | lm loss: 6.331490E+00 | loss scale: 16384.0 | grad norm: 148710.284 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4713/  159576 | consumed samples:       108096 | elapsed time per iteration (ms): 15440.5 | learning rate: 2.992E-05 | global batch size:    48 | lm loss: 6.445600E+00 | loss scale: 16384.0 | grad norm: 136119.725 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4714/  159576 | consumed samples:       108144 | elapsed time per iteration (ms): 15519.8 | learning rate: 2.994E-05 | global batch size:    48 | lm loss: 6.276518E+00 | loss scale: 16384.0 | grad norm: 170811.199 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4715/  159576 | consumed samples:       108192 | elapsed time per iteration (ms): 15866.2 | learning rate: 2.995E-05 | global batch size:    48 | lm loss: 6.430917E+00 | loss scale: 16384.0 | grad norm: 145058.329 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4716/  159576 | consumed samples:       108240 | elapsed time per iteration (ms): 15520.8 | learning rate: 2.996E-05 | global batch size:    48 | lm loss: 6.459754E+00 | loss scale: 16384.0 | grad norm: 146862.274 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4717/  159576 | consumed samples:       108288 | elapsed time per iteration (ms): 15578.0 | learning rate: 2.998E-05 | global batch size:    48 | lm loss: 6.447017E+00 | loss scale: 16384.0 | grad norm: 172505.739 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4718/  159576 | consumed samples:       108336 | elapsed time per iteration (ms): 15434.8 | learning rate: 2.999E-05 | global batch size:    48 | lm loss: 6.316633E+00 | loss scale: 16384.0 | grad norm: 130149.169 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4719/  159576 | consumed samples:       108384 | elapsed time per iteration (ms): 15703.7 | learning rate: 3.000E-05 | global batch size:    48 | lm loss: 6.376626E+00 | loss scale: 16384.0 | grad norm: 198273.301 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4720/  159576 | consumed samples:       108432 | elapsed time per iteration (ms): 15522.7 | learning rate: 3.002E-05 | global batch size:    48 | lm loss: 6.340569E+00 | loss scale: 16384.0 | grad norm: 189583.946 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4721/  159576 | consumed samples:       108480 | elapsed time per iteration (ms): 15419.9 | learning rate: 3.003E-05 | global batch size:    48 | lm loss: 6.519832E+00 | loss scale: 16384.0 | grad norm: 148280.410 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4722/  159576 | consumed samples:       108528 | elapsed time per iteration (ms): 15537.6 | learning rate: 3.004E-05 | global batch size:    48 | lm loss: 6.519564E+00 | loss scale: 16384.0 | grad norm: 165136.082 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4723/  159576 | consumed samples:       108576 | elapsed time per iteration (ms): 15984.2 | learning rate: 3.006E-05 | global batch size:    48 | lm loss: 6.331813E+00 | loss scale: 16384.0 | grad norm: 137134.914 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4724/  159576 | consumed samples:       108624 | elapsed time per iteration (ms): 15591.8 | learning rate: 3.007E-05 | global batch size:    48 | lm loss: 6.417581E+00 | loss scale: 16384.0 | grad norm: 135525.990 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4725/  159576 | consumed samples:       108672 | elapsed time per iteration (ms): 15458.7 | learning rate: 3.008E-05 | global batch size:    48 | lm loss: 6.369280E+00 | loss scale: 16384.0 | grad norm: 135730.698 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4726/  159576 | consumed samples:       108720 | elapsed time per iteration (ms): 15476.9 | learning rate: 3.010E-05 | global batch size:    48 | lm loss: 6.320598E+00 | loss scale: 16384.0 | grad norm: 147233.060 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4727/  159576 | consumed samples:       108768 | elapsed time per iteration (ms): 15812.7 | learning rate: 3.011E-05 | global batch size:    48 | lm loss: 6.469586E+00 | loss scale: 16384.0 | grad norm: 164519.317 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4728/  159576 | consumed samples:       108816 | elapsed time per iteration (ms): 15490.9 | learning rate: 3.012E-05 | global batch size:    48 | lm loss: 6.473386E+00 | loss scale: 16384.0 | grad norm: 151619.547 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4729/  159576 | consumed samples:       108864 | elapsed time per iteration (ms): 15470.7 | learning rate: 3.014E-05 | global batch size:    48 | lm loss: 6.340328E+00 | loss scale: 16384.0 | grad norm: 137036.044 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4730/  159576 | consumed samples:       108912 | elapsed time per iteration (ms): 15531.2 | learning rate: 3.015E-05 | global batch size:    48 | lm loss: 6.394744E+00 | loss scale: 16384.0 | grad norm: 146186.033 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4731/  159576 | consumed samples:       108960 | elapsed time per iteration (ms): 15606.4 | learning rate: 3.016E-05 | global batch size:    48 | lm loss: 6.362489E+00 | loss scale: 16384.0 | grad norm: 187444.936 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4732/  159576 | consumed samples:       109008 | elapsed time per iteration (ms): 15504.3 | learning rate: 3.018E-05 | global batch size:    48 | lm loss: 6.456880E+00 | loss scale: 16384.0 | grad norm: 129595.559 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4733/  159576 | consumed samples:       109056 | elapsed time per iteration (ms): 15474.7 | learning rate: 3.019E-05 | global batch size:    48 | lm loss: 6.443705E+00 | loss scale: 16384.0 | grad norm: 137176.536 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4734/  159576 | consumed samples:       109104 | elapsed time per iteration (ms): 15468.7 | learning rate: 3.020E-05 | global batch size:    48 | lm loss: 6.325924E+00 | loss scale: 16384.0 | grad norm: 130886.931 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4735/  159576 | consumed samples:       109152 | elapsed time per iteration (ms): 15622.9 | learning rate: 3.022E-05 | global batch size:    48 | lm loss: 6.367020E+00 | loss scale: 16384.0 | grad norm: 133365.928 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4736/  159576 | consumed samples:       109200 | elapsed time per iteration (ms): 15496.0 | learning rate: 3.023E-05 | global batch size:    48 | lm loss: 6.366150E+00 | loss scale: 16384.0 | grad norm: 170880.695 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4737/  159576 | consumed samples:       109248 | elapsed time per iteration (ms): 15489.1 | learning rate: 3.024E-05 | global batch size:    48 | lm loss: 6.352594E+00 | loss scale: 16384.0 | grad norm: 126383.624 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4738/  159576 | consumed samples:       109296 | elapsed time per iteration (ms): 15753.5 | learning rate: 3.026E-05 | global batch size:    48 | lm loss: 6.439698E+00 | loss scale: 16384.0 | grad norm: 178764.163 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4739/  159576 | consumed samples:       109344 | elapsed time per iteration (ms): 15669.9 | learning rate: 3.027E-05 | global batch size:    48 | lm loss: 6.379218E+00 | loss scale: 16384.0 | grad norm: 140248.496 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4740/  159576 | consumed samples:       109392 | elapsed time per iteration (ms): 15472.2 | learning rate: 3.028E-05 | global batch size:    48 | lm loss: 6.455700E+00 | loss scale: 16384.0 | grad norm: 141297.672 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4741/  159576 | consumed samples:       109440 | elapsed time per iteration (ms): 15470.3 | learning rate: 3.030E-05 | global batch size:    48 | lm loss: 6.395582E+00 | loss scale: 16384.0 | grad norm: 132933.676 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4742/  159576 | consumed samples:       109488 | elapsed time per iteration (ms): 15846.4 | learning rate: 3.031E-05 | global batch size:    48 | lm loss: 6.391361E+00 | loss scale: 16384.0 | grad norm: 118703.557 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4743/  159576 | consumed samples:       109536 | elapsed time per iteration (ms): 15513.5 | learning rate: 3.032E-05 | global batch size:    48 | lm loss: 6.428627E+00 | loss scale: 16384.0 | grad norm: 138048.574 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4744/  159576 | consumed samples:       109584 | elapsed time per iteration (ms): 15514.2 | learning rate: 3.034E-05 | global batch size:    48 | lm loss: 6.294309E+00 | loss scale: 16384.0 | grad norm: 140003.576 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4745/  159576 | consumed samples:       109632 | elapsed time per iteration (ms): 15479.8 | learning rate: 3.035E-05 | global batch size:    48 | lm loss: 6.442544E+00 | loss scale: 16384.0 | grad norm: 137520.854 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4746/  159576 | consumed samples:       109680 | elapsed time per iteration (ms): 15909.9 | learning rate: 3.036E-05 | global batch size:    48 | lm loss: 6.330937E+00 | loss scale: 16384.0 | grad norm: 133869.361 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4747/  159576 | consumed samples:       109728 | elapsed time per iteration (ms): 15438.5 | learning rate: 3.038E-05 | global batch size:    48 | lm loss: 6.375879E+00 | loss scale: 16384.0 | grad norm: 186074.447 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4748/  159576 | consumed samples:       109776 | elapsed time per iteration (ms): 15478.1 | learning rate: 3.039E-05 | global batch size:    48 | lm loss: 6.291435E+00 | loss scale: 16384.0 | grad norm: 133042.212 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4749/  159576 | consumed samples:       109824 | elapsed time per iteration (ms): 15511.0 | learning rate: 3.040E-05 | global batch size:    48 | lm loss: 6.392264E+00 | loss scale: 16384.0 | grad norm: 142954.276 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4750/  159576 | consumed samples:       109872 | elapsed time per iteration (ms): 15876.7 | learning rate: 3.042E-05 | global batch size:    48 | lm loss: 7.872174E+00 | loss scale: 16384.0 | grad norm: 409825.671 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4751/  159576 | consumed samples:       109920 | elapsed time per iteration (ms): 15539.2 | learning rate: 3.043E-05 | global batch size:    48 | lm loss: 6.478594E+00 | loss scale: 16384.0 | grad norm: 125638.703 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4752/  159576 | consumed samples:       109968 | elapsed time per iteration (ms): 15507.7 | learning rate: 3.044E-05 | global batch size:    48 | lm loss: 6.357571E+00 | loss scale: 16384.0 | grad norm: 108403.375 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4753/  159576 | consumed samples:       110016 | elapsed time per iteration (ms): 15485.4 | learning rate: 3.046E-05 | global batch size:    48 | lm loss: 6.517112E+00 | loss scale: 16384.0 | grad norm: 101971.645 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4754/  159576 | consumed samples:       110064 | elapsed time per iteration (ms): 15669.7 | learning rate: 3.047E-05 | global batch size:    48 | lm loss: 6.311660E+00 | loss scale: 16384.0 | grad norm: 117424.161 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4755/  159576 | consumed samples:       110112 | elapsed time per iteration (ms): 15529.0 | learning rate: 3.048E-05 | global batch size:    48 | lm loss: 6.452873E+00 | loss scale: 16384.0 | grad norm: 153333.779 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4756/  159576 | consumed samples:       110160 | elapsed time per iteration (ms): 15556.8 | learning rate: 3.050E-05 | global batch size:    48 | lm loss: 6.470776E+00 | loss scale: 16384.0 | grad norm: 123606.469 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4757/  159576 | consumed samples:       110208 | elapsed time per iteration (ms): 15535.1 | learning rate: 3.051E-05 | global batch size:    48 | lm loss: 6.444992E+00 | loss scale: 16384.0 | grad norm: 103337.864 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4758/  159576 | consumed samples:       110256 | elapsed time per iteration (ms): 15670.4 | learning rate: 3.052E-05 | global batch size:    48 | lm loss: 6.402925E+00 | loss scale: 16384.0 | grad norm: 145142.298 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4759/  159576 | consumed samples:       110304 | elapsed time per iteration (ms): 15615.8 | learning rate: 3.054E-05 | global batch size:    48 | lm loss: 6.383159E+00 | loss scale: 16384.0 | grad norm: 115666.450 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4760/  159576 | consumed samples:       110352 | elapsed time per iteration (ms): 15593.7 | learning rate: 3.055E-05 | global batch size:    48 | lm loss: 6.288662E+00 | loss scale: 16384.0 | grad norm: 125590.923 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4761/  159576 | consumed samples:       110400 | elapsed time per iteration (ms): 15582.7 | learning rate: 3.056E-05 | global batch size:    48 | lm loss: 6.460382E+00 | loss scale: 16384.0 | grad norm: 131535.871 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4762/  159576 | consumed samples:       110448 | elapsed time per iteration (ms): 15777.3 | learning rate: 3.058E-05 | global batch size:    48 | lm loss: 6.421331E+00 | loss scale: 16384.0 | grad norm: 123507.404 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4763/  159576 | consumed samples:       110496 | elapsed time per iteration (ms): 15542.1 | learning rate: 3.059E-05 | global batch size:    48 | lm loss: 6.471745E+00 | loss scale: 16384.0 | grad norm: 142533.784 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4764/  159576 | consumed samples:       110544 | elapsed time per iteration (ms): 15505.7 | learning rate: 3.060E-05 | global batch size:    48 | lm loss: 6.437591E+00 | loss scale: 16384.0 | grad norm: 150206.216 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4765/  159576 | consumed samples:       110592 | elapsed time per iteration (ms): 15784.9 | learning rate: 3.062E-05 | global batch size:    48 | lm loss: 6.426904E+00 | loss scale: 16384.0 | grad norm: 117533.195 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4766/  159576 | consumed samples:       110640 | elapsed time per iteration (ms): 15571.9 | learning rate: 3.063E-05 | global batch size:    48 | lm loss: 6.361554E+00 | loss scale: 16384.0 | grad norm: 125319.029 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4767/  159576 | consumed samples:       110688 | elapsed time per iteration (ms): 15502.5 | learning rate: 3.064E-05 | global batch size:    48 | lm loss: 6.404096E+00 | loss scale: 16384.0 | grad norm: 137718.459 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4768/  159576 | consumed samples:       110736 | elapsed time per iteration (ms): 15543.8 | learning rate: 3.066E-05 | global batch size:    48 | lm loss: 6.437445E+00 | loss scale: 16384.0 | grad norm: 138623.647 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4769/  159576 | consumed samples:       110784 | elapsed time per iteration (ms): 15859.0 | learning rate: 3.067E-05 | global batch size:    48 | lm loss: 6.395863E+00 | loss scale: 16384.0 | grad norm: 127878.926 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4770/  159576 | consumed samples:       110832 | elapsed time per iteration (ms): 15536.9 | learning rate: 3.068E-05 | global batch size:    48 | lm loss: 6.561028E+00 | loss scale: 16384.0 | grad norm: 124917.908 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4771/  159576 | consumed samples:       110880 | elapsed time per iteration (ms): 15506.9 | learning rate: 3.070E-05 | global batch size:    48 | lm loss: 6.471921E+00 | loss scale: 16384.0 | grad norm: 161855.552 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4772/  159576 | consumed samples:       110928 | elapsed time per iteration (ms): 15469.5 | learning rate: 3.071E-05 | global batch size:    48 | lm loss: 6.442107E+00 | loss scale: 16384.0 | grad norm: 174619.623 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4773/  159576 | consumed samples:       110976 | elapsed time per iteration (ms): 15874.3 | learning rate: 3.072E-05 | global batch size:    48 | lm loss: 6.450697E+00 | loss scale: 16384.0 | grad norm: 128857.784 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4774/  159576 | consumed samples:       111024 | elapsed time per iteration (ms): 15476.2 | learning rate: 3.074E-05 | global batch size:    48 | lm loss: 6.409184E+00 | loss scale: 16384.0 | grad norm: 167963.478 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4775/  159576 | consumed samples:       111072 | elapsed time per iteration (ms): 15524.6 | learning rate: 3.075E-05 | global batch size:    48 | lm loss: 6.521546E+00 | loss scale: 16384.0 | grad norm: 160789.278 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4776/  159576 | consumed samples:       111120 | elapsed time per iteration (ms): 15522.1 | learning rate: 3.076E-05 | global batch size:    48 | lm loss: 6.392659E+00 | loss scale: 16384.0 | grad norm: 144341.782 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4777/  159576 | consumed samples:       111168 | elapsed time per iteration (ms): 15807.4 | learning rate: 3.078E-05 | global batch size:    48 | lm loss: 6.295141E+00 | loss scale: 16384.0 | grad norm: 127243.790 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4778/  159576 | consumed samples:       111216 | elapsed time per iteration (ms): 15569.3 | learning rate: 3.079E-05 | global batch size:    48 | lm loss: 6.327214E+00 | loss scale: 16384.0 | grad norm: 126284.160 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4779/  159576 | consumed samples:       111264 | elapsed time per iteration (ms): 15403.5 | learning rate: 3.080E-05 | global batch size:    48 | lm loss: 6.573749E+00 | loss scale: 16384.0 | grad norm: 122918.062 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4780/  159576 | consumed samples:       111312 | elapsed time per iteration (ms): 15381.1 | learning rate: 3.082E-05 | global batch size:    48 | lm loss: 6.433424E+00 | loss scale: 16384.0 | grad norm: 124694.541 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4781/  159576 | consumed samples:       111360 | elapsed time per iteration (ms): 15664.5 | learning rate: 3.083E-05 | global batch size:    48 | lm loss: 6.469074E+00 | loss scale: 16384.0 | grad norm: 147526.104 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4782/  159576 | consumed samples:       111408 | elapsed time per iteration (ms): 15406.6 | learning rate: 3.084E-05 | global batch size:    48 | lm loss: 6.349575E+00 | loss scale: 16384.0 | grad norm: 124417.623 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4783/  159576 | consumed samples:       111456 | elapsed time per iteration (ms): 15497.8 | learning rate: 3.086E-05 | global batch size:    48 | lm loss: 6.254411E+00 | loss scale: 16384.0 | grad norm: 132978.536 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4784/  159576 | consumed samples:       111504 | elapsed time per iteration (ms): 15491.3 | learning rate: 3.087E-05 | global batch size:    48 | lm loss: 6.407672E+00 | loss scale: 16384.0 | grad norm: 136226.752 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4785/  159576 | consumed samples:       111552 | elapsed time per iteration (ms): 15585.7 | learning rate: 3.088E-05 | global batch size:    48 | lm loss: 6.340271E+00 | loss scale: 16384.0 | grad norm: 143403.987 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4786/  159576 | consumed samples:       111600 | elapsed time per iteration (ms): 15420.5 | learning rate: 3.090E-05 | global batch size:    48 | lm loss: 6.532565E+00 | loss scale: 16384.0 | grad norm: 146250.150 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4787/  159576 | consumed samples:       111648 | elapsed time per iteration (ms): 15350.6 | learning rate: 3.091E-05 | global batch size:    48 | lm loss: 6.387796E+00 | loss scale: 16384.0 | grad norm: 219665.169 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-24 21:07:15] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1165978_[1-10%1] on 'gpu_p13' partition)
[2021-09-24 21:07:15] PULSE: tr8-104B is running for 15:15:04 since 2021-09-24T05:52:11 (1162855_1 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8])
 iteration     4788/  159576 | consumed samples:       111696 | elapsed time per iteration (ms): 15408.2 | learning rate: 3.092E-05 | global batch size:    48 | lm loss: 6.385682E+00 | loss scale: 16384.0 | grad norm: 135205.771 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4789/  159576 | consumed samples:       111744 | elapsed time per iteration (ms): 15723.0 | learning rate: 3.094E-05 | global batch size:    48 | lm loss: 6.382418E+00 | loss scale: 16384.0 | grad norm: 135775.375 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4790/  159576 | consumed samples:       111792 | elapsed time per iteration (ms): 15412.3 | learning rate: 3.095E-05 | global batch size:    48 | lm loss: 6.349115E+00 | loss scale: 16384.0 | grad norm: 161890.935 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4791/  159576 | consumed samples:       111840 | elapsed time per iteration (ms): 15444.3 | learning rate: 3.096E-05 | global batch size:    48 | lm loss: 6.551302E+00 | loss scale: 16384.0 | grad norm: 160659.721 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4792/  159576 | consumed samples:       111888 | elapsed time per iteration (ms): 15819.0 | learning rate: 3.098E-05 | global batch size:    48 | lm loss: 6.439594E+00 | loss scale: 16384.0 | grad norm: 133779.922 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4793/  159576 | consumed samples:       111936 | elapsed time per iteration (ms): 15566.2 | learning rate: 3.099E-05 | global batch size:    48 | lm loss: 6.469571E+00 | loss scale: 16384.0 | grad norm: 134021.262 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4794/  159576 | consumed samples:       111984 | elapsed time per iteration (ms): 15417.1 | learning rate: 3.100E-05 | global batch size:    48 | lm loss: 6.302731E+00 | loss scale: 16384.0 | grad norm: 144273.145 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4795/  159576 | consumed samples:       112032 | elapsed time per iteration (ms): 15348.6 | learning rate: 3.102E-05 | global batch size:    48 | lm loss: 6.524598E+00 | loss scale: 16384.0 | grad norm: 173531.750 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4796/  159576 | consumed samples:       112080 | elapsed time per iteration (ms): 15687.5 | learning rate: 3.103E-05 | global batch size:    48 | lm loss: 6.379292E+00 | loss scale: 16384.0 | grad norm: 135799.927 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4797/  159576 | consumed samples:       112128 | elapsed time per iteration (ms): 15525.4 | learning rate: 3.104E-05 | global batch size:    48 | lm loss: 6.363866E+00 | loss scale: 16384.0 | grad norm: 157197.319 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4798/  159576 | consumed samples:       112176 | elapsed time per iteration (ms): 15407.8 | learning rate: 3.106E-05 | global batch size:    48 | lm loss: 6.301018E+00 | loss scale: 16384.0 | grad norm: 157927.560 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4799/  159576 | consumed samples:       112224 | elapsed time per iteration (ms): 15420.4 | learning rate: 3.107E-05 | global batch size:    48 | lm loss: 6.529522E+00 | loss scale: 16384.0 | grad norm: 161359.540 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4800/  159576 | consumed samples:       112272 | elapsed time per iteration (ms): 15797.9 | learning rate: 3.108E-05 | global batch size:    48 | lm loss: 6.347914E+00 | loss scale: 16384.0 | grad norm: 147972.460 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4801/  159576 | consumed samples:       112320 | elapsed time per iteration (ms): 15327.2 | learning rate: 3.110E-05 | global batch size:    48 | lm loss: 6.375738E+00 | loss scale: 16384.0 | grad norm: 153820.838 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4802/  159576 | consumed samples:       112368 | elapsed time per iteration (ms): 15430.2 | learning rate: 3.111E-05 | global batch size:    48 | lm loss: 6.380699E+00 | loss scale: 16384.0 | grad norm: 200141.688 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4803/  159576 | consumed samples:       112416 | elapsed time per iteration (ms): 15437.0 | learning rate: 3.112E-05 | global batch size:    48 | lm loss: 6.346474E+00 | loss scale: 16384.0 | grad norm: 150956.672 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4804/  159576 | consumed samples:       112464 | elapsed time per iteration (ms): 15932.7 | learning rate: 3.114E-05 | global batch size:    48 | lm loss: 6.424392E+00 | loss scale: 16384.0 | grad norm: 144387.858 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4805/  159576 | consumed samples:       112512 | elapsed time per iteration (ms): 15535.0 | learning rate: 3.115E-05 | global batch size:    48 | lm loss: 6.327216E+00 | loss scale: 16384.0 | grad norm: 145981.007 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4806/  159576 | consumed samples:       112560 | elapsed time per iteration (ms): 15433.8 | learning rate: 3.116E-05 | global batch size:    48 | lm loss: 6.352614E+00 | loss scale: 16384.0 | grad norm: 159012.654 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4807/  159576 | consumed samples:       112608 | elapsed time per iteration (ms): 15389.4 | learning rate: 3.118E-05 | global batch size:    48 | lm loss: 6.523698E+00 | loss scale: 16384.0 | grad norm: 183142.813 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4808/  159576 | consumed samples:       112656 | elapsed time per iteration (ms): 15811.1 | learning rate: 3.119E-05 | global batch size:    48 | lm loss: 6.425416E+00 | loss scale: 16384.0 | grad norm: 158356.721 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4809/  159576 | consumed samples:       112704 | elapsed time per iteration (ms): 15390.9 | learning rate: 3.120E-05 | global batch size:    48 | lm loss: 6.460537E+00 | loss scale: 16384.0 | grad norm: 160752.580 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4810/  159576 | consumed samples:       112752 | elapsed time per iteration (ms): 15403.0 | learning rate: 3.122E-05 | global batch size:    48 | lm loss: 6.358703E+00 | loss scale: 16384.0 | grad norm: 136445.446 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4811/  159576 | consumed samples:       112800 | elapsed time per iteration (ms): 15361.3 | learning rate: 3.123E-05 | global batch size:    48 | lm loss: 6.445686E+00 | loss scale: 16384.0 | grad norm: 150287.223 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4812/  159576 | consumed samples:       112848 | elapsed time per iteration (ms): 15635.2 | learning rate: 3.124E-05 | global batch size:    48 | lm loss: 6.351339E+00 | loss scale: 16384.0 | grad norm: 127746.325 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4813/  159576 | consumed samples:       112896 | elapsed time per iteration (ms): 15458.8 | learning rate: 3.126E-05 | global batch size:    48 | lm loss: 6.509888E+00 | loss scale: 16384.0 | grad norm: 142135.548 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4814/  159576 | consumed samples:       112944 | elapsed time per iteration (ms): 15373.2 | learning rate: 3.127E-05 | global batch size:    48 | lm loss: 6.393768E+00 | loss scale: 16384.0 | grad norm: 140003.150 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4815/  159576 | consumed samples:       112992 | elapsed time per iteration (ms): 15438.1 | learning rate: 3.128E-05 | global batch size:    48 | lm loss: 6.501161E+00 | loss scale: 16384.0 | grad norm: 148857.005 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4816/  159576 | consumed samples:       113040 | elapsed time per iteration (ms): 15632.8 | learning rate: 3.130E-05 | global batch size:    48 | lm loss: 6.330061E+00 | loss scale: 16384.0 | grad norm: 147693.703 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4817/  159576 | consumed samples:       113088 | elapsed time per iteration (ms): 15360.6 | learning rate: 3.131E-05 | global batch size:    48 | lm loss: 6.405270E+00 | loss scale: 16384.0 | grad norm: 135039.455 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4818/  159576 | consumed samples:       113136 | elapsed time per iteration (ms): 15427.5 | learning rate: 3.132E-05 | global batch size:    48 | lm loss: 6.376327E+00 | loss scale: 16384.0 | grad norm: 144860.784 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4819/  159576 | consumed samples:       113184 | elapsed time per iteration (ms): 15402.3 | learning rate: 3.134E-05 | global batch size:    48 | lm loss: 6.422782E+00 | loss scale: 16384.0 | grad norm: 185430.422 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4820/  159576 | consumed samples:       113232 | elapsed time per iteration (ms): 15872.7 | learning rate: 3.135E-05 | global batch size:    48 | lm loss: 6.447948E+00 | loss scale: 16384.0 | grad norm: 143563.779 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4821/  159576 | consumed samples:       113280 | elapsed time per iteration (ms): 15475.0 | learning rate: 3.136E-05 | global batch size:    48 | lm loss: 6.419926E+00 | loss scale: 16384.0 | grad norm: 139618.440 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4822/  159576 | consumed samples:       113328 | elapsed time per iteration (ms): 15479.8 | learning rate: 3.138E-05 | global batch size:    48 | lm loss: 6.307784E+00 | loss scale: 16384.0 | grad norm: 135923.320 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4823/  159576 | consumed samples:       113376 | elapsed time per iteration (ms): 15830.9 | learning rate: 3.139E-05 | global batch size:    48 | lm loss: 6.485186E+00 | loss scale: 16384.0 | grad norm: 148878.956 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4824/  159576 | consumed samples:       113424 | elapsed time per iteration (ms): 15412.5 | learning rate: 3.140E-05 | global batch size:    48 | lm loss: 6.344635E+00 | loss scale: 16384.0 | grad norm: 144634.532 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4825/  159576 | consumed samples:       113472 | elapsed time per iteration (ms): 15399.2 | learning rate: 3.142E-05 | global batch size:    48 | lm loss: 6.380017E+00 | loss scale: 16384.0 | grad norm: 149087.377 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4826/  159576 | consumed samples:       113520 | elapsed time per iteration (ms): 15495.5 | learning rate: 3.143E-05 | global batch size:    48 | lm loss: 6.478100E+00 | loss scale: 16384.0 | grad norm: 157916.270 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4827/  159576 | consumed samples:       113568 | elapsed time per iteration (ms): 15748.7 | learning rate: 3.144E-05 | global batch size:    48 | lm loss: 6.353170E+00 | loss scale: 16384.0 | grad norm: 130626.129 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4828/  159576 | consumed samples:       113616 | elapsed time per iteration (ms): 15356.7 | learning rate: 3.146E-05 | global batch size:    48 | lm loss: 6.307143E+00 | loss scale: 16384.0 | grad norm: 152222.347 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4829/  159576 | consumed samples:       113664 | elapsed time per iteration (ms): 15426.2 | learning rate: 3.147E-05 | global batch size:    48 | lm loss: 6.284460E+00 | loss scale: 16384.0 | grad norm: 135151.282 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4830/  159576 | consumed samples:       113712 | elapsed time per iteration (ms): 15453.2 | learning rate: 3.148E-05 | global batch size:    48 | lm loss: 6.389065E+00 | loss scale: 16384.0 | grad norm: 158822.080 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4831/  159576 | consumed samples:       113760 | elapsed time per iteration (ms): 15757.8 | learning rate: 3.150E-05 | global batch size:    48 | lm loss: 6.330949E+00 | loss scale: 16384.0 | grad norm: 150077.176 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4832/  159576 | consumed samples:       113808 | elapsed time per iteration (ms): 8582.4 | learning rate: 3.150E-05 | global batch size:    48 | lm loss: 6.330990E+00 | loss scale: 8192.0 | grad norm: 150077.176 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4833/  159576 | consumed samples:       113856 | elapsed time per iteration (ms): 14858.8 | learning rate: 3.151E-05 | global batch size:    48 | lm loss: 6.472740E+00 | loss scale: 8192.0 | grad norm: 80806.673 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4834/  159576 | consumed samples:       113904 | elapsed time per iteration (ms): 15406.5 | learning rate: 3.152E-05 | global batch size:    48 | lm loss: 6.386261E+00 | loss scale: 8192.0 | grad norm: 79982.750 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4835/  159576 | consumed samples:       113952 | elapsed time per iteration (ms): 15754.6 | learning rate: 3.154E-05 | global batch size:    48 | lm loss: 6.399200E+00 | loss scale: 8192.0 | grad norm: 76427.802 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4836/  159576 | consumed samples:       114000 | elapsed time per iteration (ms): 15606.6 | learning rate: 3.155E-05 | global batch size:    48 | lm loss: 6.377688E+00 | loss scale: 8192.0 | grad norm: 72730.651 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4837/  159576 | consumed samples:       114048 | elapsed time per iteration (ms): 15427.9 | learning rate: 3.156E-05 | global batch size:    48 | lm loss: 6.362796E+00 | loss scale: 8192.0 | grad norm: 75031.879 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4838/  159576 | consumed samples:       114096 | elapsed time per iteration (ms): 15459.9 | learning rate: 3.158E-05 | global batch size:    48 | lm loss: 6.427638E+00 | loss scale: 8192.0 | grad norm: 71627.109 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4839/  159576 | consumed samples:       114144 | elapsed time per iteration (ms): 15785.4 | learning rate: 3.159E-05 | global batch size:    48 | lm loss: 6.319674E+00 | loss scale: 8192.0 | grad norm: 75857.181 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4840/  159576 | consumed samples:       114192 | elapsed time per iteration (ms): 15529.1 | learning rate: 3.160E-05 | global batch size:    48 | lm loss: 6.453057E+00 | loss scale: 8192.0 | grad norm: 81110.609 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4841/  159576 | consumed samples:       114240 | elapsed time per iteration (ms): 15426.5 | learning rate: 3.162E-05 | global batch size:    48 | lm loss: 6.411851E+00 | loss scale: 8192.0 | grad norm: 86983.700 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4842/  159576 | consumed samples:       114288 | elapsed time per iteration (ms): 15460.5 | learning rate: 3.163E-05 | global batch size:    48 | lm loss: 6.377954E+00 | loss scale: 8192.0 | grad norm: 86981.542 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4843/  159576 | consumed samples:       114336 | elapsed time per iteration (ms): 15821.2 | learning rate: 3.164E-05 | global batch size:    48 | lm loss: 6.577933E+00 | loss scale: 8192.0 | grad norm: 91346.895 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4844/  159576 | consumed samples:       114384 | elapsed time per iteration (ms): 15501.1 | learning rate: 3.166E-05 | global batch size:    48 | lm loss: 6.404775E+00 | loss scale: 8192.0 | grad norm: 73191.069 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4845/  159576 | consumed samples:       114432 | elapsed time per iteration (ms): 15559.3 | learning rate: 3.167E-05 | global batch size:    48 | lm loss: 6.405911E+00 | loss scale: 8192.0 | grad norm: 77252.377 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4846/  159576 | consumed samples:       114480 | elapsed time per iteration (ms): 15521.7 | learning rate: 3.168E-05 | global batch size:    48 | lm loss: 6.505279E+00 | loss scale: 8192.0 | grad norm: 70335.265 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4847/  159576 | consumed samples:       114528 | elapsed time per iteration (ms): 15925.0 | learning rate: 3.170E-05 | global batch size:    48 | lm loss: 6.438465E+00 | loss scale: 8192.0 | grad norm: 73213.704 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4848/  159576 | consumed samples:       114576 | elapsed time per iteration (ms): 15612.2 | learning rate: 3.171E-05 | global batch size:    48 | lm loss: 6.452498E+00 | loss scale: 8192.0 | grad norm: 78502.943 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4849/  159576 | consumed samples:       114624 | elapsed time per iteration (ms): 15443.4 | learning rate: 3.172E-05 | global batch size:    48 | lm loss: 6.394375E+00 | loss scale: 8192.0 | grad norm: 87781.535 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4850/  159576 | consumed samples:       114672 | elapsed time per iteration (ms): 15479.4 | learning rate: 3.174E-05 | global batch size:    48 | lm loss: 6.435881E+00 | loss scale: 8192.0 | grad norm: 73932.494 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4851/  159576 | consumed samples:       114720 | elapsed time per iteration (ms): 15706.9 | learning rate: 3.175E-05 | global batch size:    48 | lm loss: 6.482435E+00 | loss scale: 8192.0 | grad norm: 80407.010 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4852/  159576 | consumed samples:       114768 | elapsed time per iteration (ms): 15526.6 | learning rate: 3.176E-05 | global batch size:    48 | lm loss: 6.479346E+00 | loss scale: 8192.0 | grad norm: 88804.640 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4853/  159576 | consumed samples:       114816 | elapsed time per iteration (ms): 15581.7 | learning rate: 3.178E-05 | global batch size:    48 | lm loss: 6.398011E+00 | loss scale: 8192.0 | grad norm: 85238.079 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4854/  159576 | consumed samples:       114864 | elapsed time per iteration (ms): 15591.6 | learning rate: 3.179E-05 | global batch size:    48 | lm loss: 6.439957E+00 | loss scale: 8192.0 | grad norm: 79088.978 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4855/  159576 | consumed samples:       114912 | elapsed time per iteration (ms): 15588.2 | learning rate: 3.180E-05 | global batch size:    48 | lm loss: 6.525852E+00 | loss scale: 8192.0 | grad norm: 86759.095 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4856/  159576 | consumed samples:       114960 | elapsed time per iteration (ms): 15491.8 | learning rate: 3.182E-05 | global batch size:    48 | lm loss: 6.406517E+00 | loss scale: 8192.0 | grad norm: 84644.761 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4857/  159576 | consumed samples:       115008 | elapsed time per iteration (ms): 15455.8 | learning rate: 3.183E-05 | global batch size:    48 | lm loss: 6.427845E+00 | loss scale: 8192.0 | grad norm: 95490.221 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4858/  159576 | consumed samples:       115056 | elapsed time per iteration (ms): 15508.2 | learning rate: 3.184E-05 | global batch size:    48 | lm loss: 6.500411E+00 | loss scale: 8192.0 | grad norm: 101236.693 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4859/  159576 | consumed samples:       115104 | elapsed time per iteration (ms): 15652.7 | learning rate: 3.186E-05 | global batch size:    48 | lm loss: 6.364994E+00 | loss scale: 8192.0 | grad norm: 91582.098 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4860/  159576 | consumed samples:       115152 | elapsed time per iteration (ms): 15517.9 | learning rate: 3.187E-05 | global batch size:    48 | lm loss: 6.449871E+00 | loss scale: 8192.0 | grad norm: 66096.086 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4861/  159576 | consumed samples:       115200 | elapsed time per iteration (ms): 15569.1 | learning rate: 3.188E-05 | global batch size:    48 | lm loss: 6.364583E+00 | loss scale: 8192.0 | grad norm: 83574.580 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4862/  159576 | consumed samples:       115248 | elapsed time per iteration (ms): 15872.9 | learning rate: 3.189E-05 | global batch size:    48 | lm loss: 6.322206E+00 | loss scale: 8192.0 | grad norm: 76576.722 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4863/  159576 | consumed samples:       115296 | elapsed time per iteration (ms): 15519.6 | learning rate: 3.191E-05 | global batch size:    48 | lm loss: 6.475718E+00 | loss scale: 8192.0 | grad norm: 68002.307 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4864/  159576 | consumed samples:       115344 | elapsed time per iteration (ms): 15516.6 | learning rate: 3.192E-05 | global batch size:    48 | lm loss: 6.312770E+00 | loss scale: 8192.0 | grad norm: 83359.552 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4865/  159576 | consumed samples:       115392 | elapsed time per iteration (ms): 15489.9 | learning rate: 3.193E-05 | global batch size:    48 | lm loss: 6.447346E+00 | loss scale: 8192.0 | grad norm: 79898.278 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4866/  159576 | consumed samples:       115440 | elapsed time per iteration (ms): 15854.0 | learning rate: 3.195E-05 | global batch size:    48 | lm loss: 6.343767E+00 | loss scale: 8192.0 | grad norm: 82915.939 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4867/  159576 | consumed samples:       115488 | elapsed time per iteration (ms): 15538.2 | learning rate: 3.196E-05 | global batch size:    48 | lm loss: 6.421945E+00 | loss scale: 8192.0 | grad norm: 76629.129 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4868/  159576 | consumed samples:       115536 | elapsed time per iteration (ms): 15524.2 | learning rate: 3.197E-05 | global batch size:    48 | lm loss: 6.402726E+00 | loss scale: 8192.0 | grad norm: 75429.794 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4869/  159576 | consumed samples:       115584 | elapsed time per iteration (ms): 15553.9 | learning rate: 3.199E-05 | global batch size:    48 | lm loss: 6.417988E+00 | loss scale: 8192.0 | grad norm: 82790.972 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4870/  159576 | consumed samples:       115632 | elapsed time per iteration (ms): 15916.9 | learning rate: 3.200E-05 | global batch size:    48 | lm loss: 6.289523E+00 | loss scale: 8192.0 | grad norm: 77156.616 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4871/  159576 | consumed samples:       115680 | elapsed time per iteration (ms): 15548.8 | learning rate: 3.201E-05 | global batch size:    48 | lm loss: 6.359477E+00 | loss scale: 8192.0 | grad norm: 94063.189 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4872/  159576 | consumed samples:       115728 | elapsed time per iteration (ms): 15482.5 | learning rate: 3.203E-05 | global batch size:    48 | lm loss: 6.386482E+00 | loss scale: 8192.0 | grad norm: 70658.588 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4873/  159576 | consumed samples:       115776 | elapsed time per iteration (ms): 15555.0 | learning rate: 3.204E-05 | global batch size:    48 | lm loss: 6.524825E+00 | loss scale: 8192.0 | grad norm: 86322.654 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4874/  159576 | consumed samples:       115824 | elapsed time per iteration (ms): 15950.6 | learning rate: 3.205E-05 | global batch size:    48 | lm loss: 6.358710E+00 | loss scale: 8192.0 | grad norm: 73619.690 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4875/  159576 | consumed samples:       115872 | elapsed time per iteration (ms): 15559.5 | learning rate: 3.207E-05 | global batch size:    48 | lm loss: 6.536497E+00 | loss scale: 8192.0 | grad norm: 89786.377 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4876/  159576 | consumed samples:       115920 | elapsed time per iteration (ms): 15463.5 | learning rate: 3.208E-05 | global batch size:    48 | lm loss: 6.427877E+00 | loss scale: 8192.0 | grad norm: 78839.432 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4877/  159576 | consumed samples:       115968 | elapsed time per iteration (ms): 15525.4 | learning rate: 3.209E-05 | global batch size:    48 | lm loss: 6.471958E+00 | loss scale: 8192.0 | grad norm: 76472.776 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4878/  159576 | consumed samples:       116016 | elapsed time per iteration (ms): 15732.8 | learning rate: 3.211E-05 | global batch size:    48 | lm loss: 6.437389E+00 | loss scale: 8192.0 | grad norm: 86320.939 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4879/  159576 | consumed samples:       116064 | elapsed time per iteration (ms): 15464.9 | learning rate: 3.212E-05 | global batch size:    48 | lm loss: 6.365283E+00 | loss scale: 8192.0 | grad norm: 82080.986 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4880/  159576 | consumed samples:       116112 | elapsed time per iteration (ms): 15552.2 | learning rate: 3.213E-05 | global batch size:    48 | lm loss: 6.408097E+00 | loss scale: 8192.0 | grad norm: 79728.972 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4881/  159576 | consumed samples:       116160 | elapsed time per iteration (ms): 15532.2 | learning rate: 3.215E-05 | global batch size:    48 | lm loss: 6.425485E+00 | loss scale: 8192.0 | grad norm: 102265.159 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4882/  159576 | consumed samples:       116208 | elapsed time per iteration (ms): 15707.7 | learning rate: 3.216E-05 | global batch size:    48 | lm loss: 6.276470E+00 | loss scale: 8192.0 | grad norm: 93438.364 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4883/  159576 | consumed samples:       116256 | elapsed time per iteration (ms): 15592.8 | learning rate: 3.217E-05 | global batch size:    48 | lm loss: 6.487882E+00 | loss scale: 8192.0 | grad norm: 85760.044 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4884/  159576 | consumed samples:       116304 | elapsed time per iteration (ms): 15486.2 | learning rate: 3.219E-05 | global batch size:    48 | lm loss: 6.412776E+00 | loss scale: 8192.0 | grad norm: 84281.777 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4885/  159576 | consumed samples:       116352 | elapsed time per iteration (ms): 15807.2 | learning rate: 3.220E-05 | global batch size:    48 | lm loss: 6.340213E+00 | loss scale: 8192.0 | grad norm: 79000.522 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4886/  159576 | consumed samples:       116400 | elapsed time per iteration (ms): 15690.6 | learning rate: 3.221E-05 | global batch size:    48 | lm loss: 6.368945E+00 | loss scale: 8192.0 | grad norm: 101421.475 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4887/  159576 | consumed samples:       116448 | elapsed time per iteration (ms): 15490.9 | learning rate: 3.223E-05 | global batch size:    48 | lm loss: 6.181931E+00 | loss scale: 8192.0 | grad norm: 80306.960 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4888/  159576 | consumed samples:       116496 | elapsed time per iteration (ms): 15541.0 | learning rate: 3.224E-05 | global batch size:    48 | lm loss: 6.508174E+00 | loss scale: 8192.0 | grad norm: 88863.260 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4889/  159576 | consumed samples:       116544 | elapsed time per iteration (ms): 15795.9 | learning rate: 3.225E-05 | global batch size:    48 | lm loss: 6.362309E+00 | loss scale: 8192.0 | grad norm: 82730.432 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4890/  159576 | consumed samples:       116592 | elapsed time per iteration (ms): 15612.5 | learning rate: 3.227E-05 | global batch size:    48 | lm loss: 6.457442E+00 | loss scale: 8192.0 | grad norm: 77751.832 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4891/  159576 | consumed samples:       116640 | elapsed time per iteration (ms): 15523.7 | learning rate: 3.228E-05 | global batch size:    48 | lm loss: 6.382168E+00 | loss scale: 8192.0 | grad norm: 95335.147 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4892/  159576 | consumed samples:       116688 | elapsed time per iteration (ms): 15565.3 | learning rate: 3.229E-05 | global batch size:    48 | lm loss: 6.443634E+00 | loss scale: 8192.0 | grad norm: 141532.607 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4893/  159576 | consumed samples:       116736 | elapsed time per iteration (ms): 15920.8 | learning rate: 3.231E-05 | global batch size:    48 | lm loss: 6.475467E+00 | loss scale: 8192.0 | grad norm: 99006.769 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4894/  159576 | consumed samples:       116784 | elapsed time per iteration (ms): 15438.9 | learning rate: 3.232E-05 | global batch size:    48 | lm loss: 6.465964E+00 | loss scale: 8192.0 | grad norm: 104819.919 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4895/  159576 | consumed samples:       116832 | elapsed time per iteration (ms): 15486.6 | learning rate: 3.233E-05 | global batch size:    48 | lm loss: 6.355396E+00 | loss scale: 8192.0 | grad norm: 88645.070 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4896/  159576 | consumed samples:       116880 | elapsed time per iteration (ms): 15530.2 | learning rate: 3.235E-05 | global batch size:    48 | lm loss: 6.397956E+00 | loss scale: 8192.0 | grad norm: 97080.394 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4897/  159576 | consumed samples:       116928 | elapsed time per iteration (ms): 15972.1 | learning rate: 3.236E-05 | global batch size:    48 | lm loss: 6.376213E+00 | loss scale: 8192.0 | grad norm: 91571.932 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4898/  159576 | consumed samples:       116976 | elapsed time per iteration (ms): 15582.4 | learning rate: 3.237E-05 | global batch size:    48 | lm loss: 6.338162E+00 | loss scale: 8192.0 | grad norm: 95029.310 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4899/  159576 | consumed samples:       117024 | elapsed time per iteration (ms): 15514.7 | learning rate: 3.239E-05 | global batch size:    48 | lm loss: 6.420194E+00 | loss scale: 8192.0 | grad norm: 115966.472 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4900/  159576 | consumed samples:       117072 | elapsed time per iteration (ms): 15492.3 | learning rate: 3.240E-05 | global batch size:    48 | lm loss: 6.472268E+00 | loss scale: 8192.0 | grad norm: 117112.305 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4901/  159576 | consumed samples:       117120 | elapsed time per iteration (ms): 15707.8 | learning rate: 3.241E-05 | global batch size:    48 | lm loss: 6.365590E+00 | loss scale: 8192.0 | grad norm: 126111.497 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4902/  159576 | consumed samples:       117168 | elapsed time per iteration (ms): 15440.6 | learning rate: 3.243E-05 | global batch size:    48 | lm loss: 6.341323E+00 | loss scale: 8192.0 | grad norm: 141040.178 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4903/  159576 | consumed samples:       117216 | elapsed time per iteration (ms): 15486.6 | learning rate: 3.244E-05 | global batch size:    48 | lm loss: 6.294356E+00 | loss scale: 8192.0 | grad norm: 92893.758 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4904/  159576 | consumed samples:       117264 | elapsed time per iteration (ms): 15374.1 | learning rate: 3.245E-05 | global batch size:    48 | lm loss: 6.459288E+00 | loss scale: 8192.0 | grad norm: 105593.680 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4905/  159576 | consumed samples:       117312 | elapsed time per iteration (ms): 15525.3 | learning rate: 3.247E-05 | global batch size:    48 | lm loss: 6.321597E+00 | loss scale: 8192.0 | grad norm: 92345.299 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4906/  159576 | consumed samples:       117360 | elapsed time per iteration (ms): 15464.1 | learning rate: 3.248E-05 | global batch size:    48 | lm loss: 6.394690E+00 | loss scale: 8192.0 | grad norm: 115046.817 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4907/  159576 | consumed samples:       117408 | elapsed time per iteration (ms): 15463.2 | learning rate: 3.249E-05 | global batch size:    48 | lm loss: 6.382209E+00 | loss scale: 8192.0 | grad norm: 129712.277 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4908/  159576 | consumed samples:       117456 | elapsed time per iteration (ms): 15513.8 | learning rate: 3.251E-05 | global batch size:    48 | lm loss: 6.406621E+00 | loss scale: 8192.0 | grad norm: 97342.857 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4909/  159576 | consumed samples:       117504 | elapsed time per iteration (ms): 15695.2 | learning rate: 3.252E-05 | global batch size:    48 | lm loss: 6.313143E+00 | loss scale: 8192.0 | grad norm: 113026.841 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4910/  159576 | consumed samples:       117552 | elapsed time per iteration (ms): 15443.0 | learning rate: 3.253E-05 | global batch size:    48 | lm loss: 6.450486E+00 | loss scale: 8192.0 | grad norm: 95063.553 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4911/  159576 | consumed samples:       117600 | elapsed time per iteration (ms): 15416.6 | learning rate: 3.255E-05 | global batch size:    48 | lm loss: 6.485876E+00 | loss scale: 8192.0 | grad norm: 102064.281 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4912/  159576 | consumed samples:       117648 | elapsed time per iteration (ms): 15823.7 | learning rate: 3.256E-05 | global batch size:    48 | lm loss: 6.276315E+00 | loss scale: 8192.0 | grad norm: 114959.499 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4913/  159576 | consumed samples:       117696 | elapsed time per iteration (ms): 15625.5 | learning rate: 3.257E-05 | global batch size:    48 | lm loss: 6.405933E+00 | loss scale: 8192.0 | grad norm: 117232.383 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4914/  159576 | consumed samples:       117744 | elapsed time per iteration (ms): 15455.3 | learning rate: 3.259E-05 | global batch size:    48 | lm loss: 6.233083E+00 | loss scale: 8192.0 | grad norm: 109853.141 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4915/  159576 | consumed samples:       117792 | elapsed time per iteration (ms): 15594.3 | learning rate: 3.260E-05 | global batch size:    48 | lm loss: 6.418136E+00 | loss scale: 8192.0 | grad norm: 108180.861 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4916/  159576 | consumed samples:       117840 | elapsed time per iteration (ms): 15954.3 | learning rate: 3.261E-05 | global batch size:    48 | lm loss: 6.385183E+00 | loss scale: 8192.0 | grad norm: 103614.011 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4917/  159576 | consumed samples:       117888 | elapsed time per iteration (ms): 15458.8 | learning rate: 3.263E-05 | global batch size:    48 | lm loss: 6.341071E+00 | loss scale: 8192.0 | grad norm: 87833.153 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4918/  159576 | consumed samples:       117936 | elapsed time per iteration (ms): 15501.3 | learning rate: 3.264E-05 | global batch size:    48 | lm loss: 6.418250E+00 | loss scale: 8192.0 | grad norm: 91681.912 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4919/  159576 | consumed samples:       117984 | elapsed time per iteration (ms): 15446.3 | learning rate: 3.265E-05 | global batch size:    48 | lm loss: 6.298886E+00 | loss scale: 8192.0 | grad norm: 98048.635 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4920/  159576 | consumed samples:       118032 | elapsed time per iteration (ms): 15905.0 | learning rate: 3.267E-05 | global batch size:    48 | lm loss: 6.413123E+00 | loss scale: 8192.0 | grad norm: 103541.248 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4921/  159576 | consumed samples:       118080 | elapsed time per iteration (ms): 15416.1 | learning rate: 3.268E-05 | global batch size:    48 | lm loss: 6.282074E+00 | loss scale: 8192.0 | grad norm: 100452.621 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4922/  159576 | consumed samples:       118128 | elapsed time per iteration (ms): 15499.9 | learning rate: 3.269E-05 | global batch size:    48 | lm loss: 6.371088E+00 | loss scale: 8192.0 | grad norm: 118401.522 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4923/  159576 | consumed samples:       118176 | elapsed time per iteration (ms): 15522.6 | learning rate: 3.271E-05 | global batch size:    48 | lm loss: 6.399379E+00 | loss scale: 8192.0 | grad norm: 100877.549 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4924/  159576 | consumed samples:       118224 | elapsed time per iteration (ms): 15859.1 | learning rate: 3.272E-05 | global batch size:    48 | lm loss: 6.450886E+00 | loss scale: 8192.0 | grad norm: 115997.698 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4925/  159576 | consumed samples:       118272 | elapsed time per iteration (ms): 15622.0 | learning rate: 3.273E-05 | global batch size:    48 | lm loss: 6.412412E+00 | loss scale: 8192.0 | grad norm: 121229.477 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4926/  159576 | consumed samples:       118320 | elapsed time per iteration (ms): 15522.5 | learning rate: 3.275E-05 | global batch size:    48 | lm loss: 6.276751E+00 | loss scale: 8192.0 | grad norm: 127323.029 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4927/  159576 | consumed samples:       118368 | elapsed time per iteration (ms): 15489.0 | learning rate: 3.276E-05 | global batch size:    48 | lm loss: 6.328137E+00 | loss scale: 8192.0 | grad norm: 109231.572 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4928/  159576 | consumed samples:       118416 | elapsed time per iteration (ms): 15679.3 | learning rate: 3.277E-05 | global batch size:    48 | lm loss: 6.343997E+00 | loss scale: 8192.0 | grad norm: 94463.087 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4929/  159576 | consumed samples:       118464 | elapsed time per iteration (ms): 15506.4 | learning rate: 3.279E-05 | global batch size:    48 | lm loss: 6.367960E+00 | loss scale: 8192.0 | grad norm: 104644.038 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4930/  159576 | consumed samples:       118512 | elapsed time per iteration (ms): 15552.6 | learning rate: 3.280E-05 | global batch size:    48 | lm loss: 6.375040E+00 | loss scale: 8192.0 | grad norm: 108080.731 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4931/  159576 | consumed samples:       118560 | elapsed time per iteration (ms): 15566.6 | learning rate: 3.281E-05 | global batch size:    48 | lm loss: 6.468022E+00 | loss scale: 8192.0 | grad norm: 98813.039 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4932/  159576 | consumed samples:       118608 | elapsed time per iteration (ms): 15633.8 | learning rate: 3.283E-05 | global batch size:    48 | lm loss: 6.478949E+00 | loss scale: 8192.0 | grad norm: 119522.152 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4933/  159576 | consumed samples:       118656 | elapsed time per iteration (ms): 15451.3 | learning rate: 3.284E-05 | global batch size:    48 | lm loss: 6.415487E+00 | loss scale: 8192.0 | grad norm: 121029.519 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4934/  159576 | consumed samples:       118704 | elapsed time per iteration (ms): 15537.9 | learning rate: 3.285E-05 | global batch size:    48 | lm loss: 6.436414E+00 | loss scale: 8192.0 | grad norm: 114108.101 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4935/  159576 | consumed samples:       118752 | elapsed time per iteration (ms): 15442.4 | learning rate: 3.287E-05 | global batch size:    48 | lm loss: 6.380546E+00 | loss scale: 8192.0 | grad norm: 102153.332 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4936/  159576 | consumed samples:       118800 | elapsed time per iteration (ms): 15674.3 | learning rate: 3.288E-05 | global batch size:    48 | lm loss: 6.524636E+00 | loss scale: 8192.0 | grad norm: 89702.742 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4937/  159576 | consumed samples:       118848 | elapsed time per iteration (ms): 15501.6 | learning rate: 3.289E-05 | global batch size:    48 | lm loss: 6.352899E+00 | loss scale: 8192.0 | grad norm: 106241.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4938/  159576 | consumed samples:       118896 | elapsed time per iteration (ms): 15494.9 | learning rate: 3.291E-05 | global batch size:    48 | lm loss: 6.292633E+00 | loss scale: 8192.0 | grad norm: 95129.966 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4939/  159576 | consumed samples:       118944 | elapsed time per iteration (ms): 15936.8 | learning rate: 3.292E-05 | global batch size:    48 | lm loss: 6.337314E+00 | loss scale: 8192.0 | grad norm: 120723.828 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4940/  159576 | consumed samples:       118992 | elapsed time per iteration (ms): 15531.1 | learning rate: 3.293E-05 | global batch size:    48 | lm loss: 6.391080E+00 | loss scale: 8192.0 | grad norm: 145548.804 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4941/  159576 | consumed samples:       119040 | elapsed time per iteration (ms): 15466.0 | learning rate: 3.295E-05 | global batch size:    48 | lm loss: 6.343481E+00 | loss scale: 8192.0 | grad norm: 211104.534 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4942/  159576 | consumed samples:       119088 | elapsed time per iteration (ms): 15505.4 | learning rate: 3.296E-05 | global batch size:    48 | lm loss: 6.528688E+00 | loss scale: 8192.0 | grad norm: 140909.560 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4943/  159576 | consumed samples:       119136 | elapsed time per iteration (ms): 15830.2 | learning rate: 3.297E-05 | global batch size:    48 | lm loss: 6.411016E+00 | loss scale: 8192.0 | grad norm: 127370.305 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4944/  159576 | consumed samples:       119184 | elapsed time per iteration (ms): 15400.2 | learning rate: 3.299E-05 | global batch size:    48 | lm loss: 6.483131E+00 | loss scale: 8192.0 | grad norm: 104651.898 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4945/  159576 | consumed samples:       119232 | elapsed time per iteration (ms): 15491.5 | learning rate: 3.300E-05 | global batch size:    48 | lm loss: 6.509373E+00 | loss scale: 8192.0 | grad norm: 129067.934 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4946/  159576 | consumed samples:       119280 | elapsed time per iteration (ms): 15557.0 | learning rate: 3.301E-05 | global batch size:    48 | lm loss: 6.338033E+00 | loss scale: 8192.0 | grad norm: 111737.692 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4947/  159576 | consumed samples:       119328 | elapsed time per iteration (ms): 15880.4 | learning rate: 3.303E-05 | global batch size:    48 | lm loss: 6.346412E+00 | loss scale: 8192.0 | grad norm: 105173.160 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4948/  159576 | consumed samples:       119376 | elapsed time per iteration (ms): 15470.3 | learning rate: 3.304E-05 | global batch size:    48 | lm loss: 6.433241E+00 | loss scale: 8192.0 | grad norm: 117253.932 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4949/  159576 | consumed samples:       119424 | elapsed time per iteration (ms): 15464.0 | learning rate: 3.305E-05 | global batch size:    48 | lm loss: 6.408391E+00 | loss scale: 8192.0 | grad norm: 100408.960 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4950/  159576 | consumed samples:       119472 | elapsed time per iteration (ms): 15498.5 | learning rate: 3.307E-05 | global batch size:    48 | lm loss: 6.403716E+00 | loss scale: 8192.0 | grad norm: 124240.587 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4951/  159576 | consumed samples:       119520 | elapsed time per iteration (ms): 15815.9 | learning rate: 3.308E-05 | global batch size:    48 | lm loss: 6.389519E+00 | loss scale: 8192.0 | grad norm: 100463.890 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4952/  159576 | consumed samples:       119568 | elapsed time per iteration (ms): 15557.3 | learning rate: 3.309E-05 | global batch size:    48 | lm loss: 6.505785E+00 | loss scale: 8192.0 | grad norm: 106487.068 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4953/  159576 | consumed samples:       119616 | elapsed time per iteration (ms): 15479.5 | learning rate: 3.311E-05 | global batch size:    48 | lm loss: 6.381755E+00 | loss scale: 8192.0 | grad norm: 102228.411 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4954/  159576 | consumed samples:       119664 | elapsed time per iteration (ms): 15481.8 | learning rate: 3.312E-05 | global batch size:    48 | lm loss: 6.379836E+00 | loss scale: 8192.0 | grad norm: 118394.733 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4955/  159576 | consumed samples:       119712 | elapsed time per iteration (ms): 15784.5 | learning rate: 3.313E-05 | global batch size:    48 | lm loss: 6.475849E+00 | loss scale: 8192.0 | grad norm: 122087.327 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4956/  159576 | consumed samples:       119760 | elapsed time per iteration (ms): 15436.2 | learning rate: 3.315E-05 | global batch size:    48 | lm loss: 6.490977E+00 | loss scale: 8192.0 | grad norm: 123577.161 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4957/  159576 | consumed samples:       119808 | elapsed time per iteration (ms): 15420.1 | learning rate: 3.316E-05 | global batch size:    48 | lm loss: 6.418243E+00 | loss scale: 8192.0 | grad norm: 146260.906 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4958/  159576 | consumed samples:       119856 | elapsed time per iteration (ms): 15433.1 | learning rate: 3.317E-05 | global batch size:    48 | lm loss: 6.375823E+00 | loss scale: 8192.0 | grad norm: 102943.358 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4959/  159576 | consumed samples:       119904 | elapsed time per iteration (ms): 15549.7 | learning rate: 3.319E-05 | global batch size:    48 | lm loss: 6.454865E+00 | loss scale: 8192.0 | grad norm: 95733.097 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4960/  159576 | consumed samples:       119952 | elapsed time per iteration (ms): 15477.0 | learning rate: 3.320E-05 | global batch size:    48 | lm loss: 6.376845E+00 | loss scale: 8192.0 | grad norm: 105409.137 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4961/  159576 | consumed samples:       120000 | elapsed time per iteration (ms): 15553.6 | learning rate: 3.321E-05 | global batch size:    48 | lm loss: 6.369764E+00 | loss scale: 8192.0 | grad norm: 100426.286 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4962/  159576 | consumed samples:       120048 | elapsed time per iteration (ms): 15567.9 | learning rate: 3.323E-05 | global batch size:    48 | lm loss: 6.386555E+00 | loss scale: 8192.0 | grad norm: 100112.758 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4963/  159576 | consumed samples:       120096 | elapsed time per iteration (ms): 15733.5 | learning rate: 3.324E-05 | global batch size:    48 | lm loss: 6.487816E+00 | loss scale: 8192.0 | grad norm: 117343.127 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4964/  159576 | consumed samples:       120144 | elapsed time per iteration (ms): 15368.5 | learning rate: 3.325E-05 | global batch size:    48 | lm loss: 6.415962E+00 | loss scale: 8192.0 | grad norm: 98866.878 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4965/  159576 | consumed samples:       120192 | elapsed time per iteration (ms): 15477.1 | learning rate: 3.327E-05 | global batch size:    48 | lm loss: 6.374081E+00 | loss scale: 8192.0 | grad norm: 124767.543 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4966/  159576 | consumed samples:       120240 | elapsed time per iteration (ms): 15922.3 | learning rate: 3.328E-05 | global batch size:    48 | lm loss: 6.338925E+00 | loss scale: 8192.0 | grad norm: 229637.846 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4967/  159576 | consumed samples:       120288 | elapsed time per iteration (ms): 15438.9 | learning rate: 3.329E-05 | global batch size:    48 | lm loss: 6.318257E+00 | loss scale: 8192.0 | grad norm: 138618.442 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4968/  159576 | consumed samples:       120336 | elapsed time per iteration (ms): 15527.5 | learning rate: 3.331E-05 | global batch size:    48 | lm loss: 6.406815E+00 | loss scale: 8192.0 | grad norm: 101628.651 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4969/  159576 | consumed samples:       120384 | elapsed time per iteration (ms): 15565.4 | learning rate: 3.332E-05 | global batch size:    48 | lm loss: 6.381866E+00 | loss scale: 8192.0 | grad norm: 138150.093 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4970/  159576 | consumed samples:       120432 | elapsed time per iteration (ms): 15898.0 | learning rate: 3.333E-05 | global batch size:    48 | lm loss: 6.305198E+00 | loss scale: 8192.0 | grad norm: 94133.912 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4971/  159576 | consumed samples:       120480 | elapsed time per iteration (ms): 15413.4 | learning rate: 3.335E-05 | global batch size:    48 | lm loss: 6.276737E+00 | loss scale: 8192.0 | grad norm: 89212.813 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4972/  159576 | consumed samples:       120528 | elapsed time per iteration (ms): 15553.0 | learning rate: 3.336E-05 | global batch size:    48 | lm loss: 6.404760E+00 | loss scale: 8192.0 | grad norm: 119702.116 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4973/  159576 | consumed samples:       120576 | elapsed time per iteration (ms): 15428.6 | learning rate: 3.337E-05 | global batch size:    48 | lm loss: 6.225817E+00 | loss scale: 8192.0 | grad norm: 94382.038 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4974/  159576 | consumed samples:       120624 | elapsed time per iteration (ms): 15767.2 | learning rate: 3.339E-05 | global batch size:    48 | lm loss: 6.442757E+00 | loss scale: 8192.0 | grad norm: 99692.552 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4975/  159576 | consumed samples:       120672 | elapsed time per iteration (ms): 15514.4 | learning rate: 3.340E-05 | global batch size:    48 | lm loss: 6.472607E+00 | loss scale: 8192.0 | grad norm: 112543.414 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4976/  159576 | consumed samples:       120720 | elapsed time per iteration (ms): 15602.8 | learning rate: 3.341E-05 | global batch size:    48 | lm loss: 6.382205E+00 | loss scale: 8192.0 | grad norm: 97309.286 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4977/  159576 | consumed samples:       120768 | elapsed time per iteration (ms): 15584.4 | learning rate: 3.343E-05 | global batch size:    48 | lm loss: 6.527099E+00 | loss scale: 8192.0 | grad norm: 91482.780 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4978/  159576 | consumed samples:       120816 | elapsed time per iteration (ms): 15753.9 | learning rate: 3.344E-05 | global batch size:    48 | lm loss: 6.475079E+00 | loss scale: 8192.0 | grad norm: 167594.086 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4979/  159576 | consumed samples:       120864 | elapsed time per iteration (ms): 15592.8 | learning rate: 3.345E-05 | global batch size:    48 | lm loss: 6.377496E+00 | loss scale: 8192.0 | grad norm: 94710.465 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4980/  159576 | consumed samples:       120912 | elapsed time per iteration (ms): 15439.6 | learning rate: 3.347E-05 | global batch size:    48 | lm loss: 6.396212E+00 | loss scale: 8192.0 | grad norm: 82226.776 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4981/  159576 | consumed samples:       120960 | elapsed time per iteration (ms): 15453.4 | learning rate: 3.348E-05 | global batch size:    48 | lm loss: 6.392390E+00 | loss scale: 8192.0 | grad norm: 93532.515 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4982/  159576 | consumed samples:       121008 | elapsed time per iteration (ms): 15623.6 | learning rate: 3.349E-05 | global batch size:    48 | lm loss: 6.384733E+00 | loss scale: 8192.0 | grad norm: 99819.245 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4983/  159576 | consumed samples:       121056 | elapsed time per iteration (ms): 15476.4 | learning rate: 3.351E-05 | global batch size:    48 | lm loss: 6.365707E+00 | loss scale: 8192.0 | grad norm: 115195.515 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4984/  159576 | consumed samples:       121104 | elapsed time per iteration (ms): 15519.9 | learning rate: 3.352E-05 | global batch size:    48 | lm loss: 6.280232E+00 | loss scale: 8192.0 | grad norm: 88569.976 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4985/  159576 | consumed samples:       121152 | elapsed time per iteration (ms): 15489.3 | learning rate: 3.353E-05 | global batch size:    48 | lm loss: 6.514761E+00 | loss scale: 8192.0 | grad norm: 110101.646 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4986/  159576 | consumed samples:       121200 | elapsed time per iteration (ms): 15582.9 | learning rate: 3.355E-05 | global batch size:    48 | lm loss: 6.394022E+00 | loss scale: 8192.0 | grad norm: 104900.137 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4987/  159576 | consumed samples:       121248 | elapsed time per iteration (ms): 15478.8 | learning rate: 3.356E-05 | global batch size:    48 | lm loss: 6.428993E+00 | loss scale: 8192.0 | grad norm: 99980.054 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4988/  159576 | consumed samples:       121296 | elapsed time per iteration (ms): 15470.8 | learning rate: 3.357E-05 | global batch size:    48 | lm loss: 6.383337E+00 | loss scale: 8192.0 | grad norm: 96150.673 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4989/  159576 | consumed samples:       121344 | elapsed time per iteration (ms): 15490.7 | learning rate: 3.359E-05 | global batch size:    48 | lm loss: 6.440140E+00 | loss scale: 8192.0 | grad norm: 99225.792 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4990/  159576 | consumed samples:       121392 | elapsed time per iteration (ms): 16022.8 | learning rate: 3.360E-05 | global batch size:    48 | lm loss: 6.329103E+00 | loss scale: 8192.0 | grad norm: 77357.711 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4991/  159576 | consumed samples:       121440 | elapsed time per iteration (ms): 15500.7 | learning rate: 3.361E-05 | global batch size:    48 | lm loss: 6.346808E+00 | loss scale: 8192.0 | grad norm: 83379.862 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4992/  159576 | consumed samples:       121488 | elapsed time per iteration (ms): 15638.6 | learning rate: 3.363E-05 | global batch size:    48 | lm loss: 6.460890E+00 | loss scale: 8192.0 | grad norm: 114878.567 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4993/  159576 | consumed samples:       121536 | elapsed time per iteration (ms): 15882.0 | learning rate: 3.364E-05 | global batch size:    48 | lm loss: 6.485402E+00 | loss scale: 8192.0 | grad norm: 164153.089 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4994/  159576 | consumed samples:       121584 | elapsed time per iteration (ms): 15543.1 | learning rate: 3.365E-05 | global batch size:    48 | lm loss: 6.511444E+00 | loss scale: 8192.0 | grad norm: 102365.809 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4995/  159576 | consumed samples:       121632 | elapsed time per iteration (ms): 15538.2 | learning rate: 3.367E-05 | global batch size:    48 | lm loss: 6.413379E+00 | loss scale: 8192.0 | grad norm: 115181.224 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4996/  159576 | consumed samples:       121680 | elapsed time per iteration (ms): 15508.4 | learning rate: 3.368E-05 | global batch size:    48 | lm loss: 6.359092E+00 | loss scale: 8192.0 | grad norm: 117830.201 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4997/  159576 | consumed samples:       121728 | elapsed time per iteration (ms): 15913.3 | learning rate: 3.369E-05 | global batch size:    48 | lm loss: 6.388143E+00 | loss scale: 8192.0 | grad norm: 90924.050 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4998/  159576 | consumed samples:       121776 | elapsed time per iteration (ms): 15515.1 | learning rate: 3.371E-05 | global batch size:    48 | lm loss: 6.391698E+00 | loss scale: 8192.0 | grad norm: 103478.291 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4999/  159576 | consumed samples:       121824 | elapsed time per iteration (ms): 15550.7 | learning rate: 3.372E-05 | global batch size:    48 | lm loss: 6.279563E+00 | loss scale: 8192.0 | grad norm: 129145.172 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5000/  159576 | consumed samples:       121872 | elapsed time per iteration (ms): 15649.4 | learning rate: 3.373E-05 | global batch size:    48 | lm loss: 6.273400E+00 | loss scale: 8192.0 | grad norm: 105673.602 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
------------------------------------------------------------------------------------------------
 validation loss at iteration 5000 | lm loss value: 6.285937E+00 | lm loss PPL: 5.369671E+02 | 
------------------------------------------------------------------------------------------------
 iteration     5001/  159576 | consumed samples:       121920 | elapsed time per iteration (ms): 23355.2 | learning rate: 3.375E-05 | global batch size:    48 | lm loss: 6.432467E+00 | loss scale: 8192.0 | grad norm: 94517.127 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5002/  159576 | consumed samples:       121968 | elapsed time per iteration (ms): 15480.8 | learning rate: 3.376E-05 | global batch size:    48 | lm loss: 6.406679E+00 | loss scale: 8192.0 | grad norm: 93989.506 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5003/  159576 | consumed samples:       122016 | elapsed time per iteration (ms): 15462.8 | learning rate: 3.377E-05 | global batch size:    48 | lm loss: 6.425644E+00 | loss scale: 8192.0 | grad norm: 89681.033 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5004/  159576 | consumed samples:       122064 | elapsed time per iteration (ms): 15981.7 | learning rate: 3.379E-05 | global batch size:    48 | lm loss: 6.492604E+00 | loss scale: 8192.0 | grad norm: 95165.571 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5005/  159576 | consumed samples:       122112 | elapsed time per iteration (ms): 15437.2 | learning rate: 3.380E-05 | global batch size:    48 | lm loss: 6.335800E+00 | loss scale: 8192.0 | grad norm: 84441.007 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5006/  159576 | consumed samples:       122160 | elapsed time per iteration (ms): 15473.9 | learning rate: 3.381E-05 | global batch size:    48 | lm loss: 6.304031E+00 | loss scale: 8192.0 | grad norm: 87318.237 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5007/  159576 | consumed samples:       122208 | elapsed time per iteration (ms): 15548.0 | learning rate: 3.383E-05 | global batch size:    48 | lm loss: 6.363890E+00 | loss scale: 8192.0 | grad norm: 92281.656 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5008/  159576 | consumed samples:       122256 | elapsed time per iteration (ms): 15796.4 | learning rate: 3.384E-05 | global batch size:    48 | lm loss: 6.347075E+00 | loss scale: 8192.0 | grad norm: 103172.108 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5009/  159576 | consumed samples:       122304 | elapsed time per iteration (ms): 15464.5 | learning rate: 3.385E-05 | global batch size:    48 | lm loss: 6.448061E+00 | loss scale: 8192.0 | grad norm: 95534.359 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5010/  159576 | consumed samples:       122352 | elapsed time per iteration (ms): 15447.7 | learning rate: 3.387E-05 | global batch size:    48 | lm loss: 6.328472E+00 | loss scale: 8192.0 | grad norm: 84995.521 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5011/  159576 | consumed samples:       122400 | elapsed time per iteration (ms): 15420.5 | learning rate: 3.388E-05 | global batch size:    48 | lm loss: 6.340866E+00 | loss scale: 8192.0 | grad norm: 82422.631 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5012/  159576 | consumed samples:       122448 | elapsed time per iteration (ms): 15839.2 | learning rate: 3.389E-05 | global batch size:    48 | lm loss: 6.397783E+00 | loss scale: 8192.0 | grad norm: 162057.226 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5013/  159576 | consumed samples:       122496 | elapsed time per iteration (ms): 15565.6 | learning rate: 3.391E-05 | global batch size:    48 | lm loss: 6.363326E+00 | loss scale: 8192.0 | grad norm: 86690.137 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5014/  159576 | consumed samples:       122544 | elapsed time per iteration (ms): 15554.7 | learning rate: 3.392E-05 | global batch size:    48 | lm loss: 6.421363E+00 | loss scale: 8192.0 | grad norm: 102318.730 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5015/  159576 | consumed samples:       122592 | elapsed time per iteration (ms): 15616.9 | learning rate: 3.393E-05 | global batch size:    48 | lm loss: 6.322345E+00 | loss scale: 8192.0 | grad norm: 83052.732 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5016/  159576 | consumed samples:       122640 | elapsed time per iteration (ms): 15870.8 | learning rate: 3.395E-05 | global batch size:    48 | lm loss: 6.384270E+00 | loss scale: 8192.0 | grad norm: 167288.542 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5017/  159576 | consumed samples:       122688 | elapsed time per iteration (ms): 15476.4 | learning rate: 3.396E-05 | global batch size:    48 | lm loss: 6.423479E+00 | loss scale: 8192.0 | grad norm: 86029.728 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5018/  159576 | consumed samples:       122736 | elapsed time per iteration (ms): 15464.3 | learning rate: 3.397E-05 | global batch size:    48 | lm loss: 6.393809E+00 | loss scale: 8192.0 | grad norm: 123082.971 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5019/  159576 | consumed samples:       122784 | elapsed time per iteration (ms): 15459.3 | learning rate: 3.399E-05 | global batch size:    48 | lm loss: 6.420121E+00 | loss scale: 8192.0 | grad norm: 82967.862 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5020/  159576 | consumed samples:       122832 | elapsed time per iteration (ms): 15660.8 | learning rate: 3.400E-05 | global batch size:    48 | lm loss: 6.436828E+00 | loss scale: 8192.0 | grad norm: 94157.906 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-24 22:07:41] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1165978_[1-10%1] on 'gpu_p13' partition)
[2021-09-24 22:07:41] PULSE: tr8-104B is running for 16:15:30 since 2021-09-24T05:52:11 (1162855_1 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8])
 iteration     5021/  159576 | consumed samples:       122880 | elapsed time per iteration (ms): 15506.9 | learning rate: 3.401E-05 | global batch size:    48 | lm loss: 6.230031E+00 | loss scale: 8192.0 | grad norm: 93236.907 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5022/  159576 | consumed samples:       122928 | elapsed time per iteration (ms): 15486.4 | learning rate: 3.403E-05 | global batch size:    48 | lm loss: 6.434629E+00 | loss scale: 8192.0 | grad norm: 88122.737 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5023/  159576 | consumed samples:       122976 | elapsed time per iteration (ms): 15558.0 | learning rate: 3.404E-05 | global batch size:    48 | lm loss: 6.447264E+00 | loss scale: 8192.0 | grad norm: 99782.616 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5024/  159576 | consumed samples:       123024 | elapsed time per iteration (ms): 15657.7 | learning rate: 3.405E-05 | global batch size:    48 | lm loss: 6.403034E+00 | loss scale: 8192.0 | grad norm: 102592.242 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5025/  159576 | consumed samples:       123072 | elapsed time per iteration (ms): 15429.0 | learning rate: 3.407E-05 | global batch size:    48 | lm loss: 6.433703E+00 | loss scale: 8192.0 | grad norm: 82492.534 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5026/  159576 | consumed samples:       123120 | elapsed time per iteration (ms): 15492.8 | learning rate: 3.408E-05 | global batch size:    48 | lm loss: 6.505131E+00 | loss scale: 8192.0 | grad norm: 334700.898 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5027/  159576 | consumed samples:       123168 | elapsed time per iteration (ms): 15456.4 | learning rate: 3.409E-05 | global batch size:    48 | lm loss: 6.312271E+00 | loss scale: 8192.0 | grad norm: 101204.541 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5028/  159576 | consumed samples:       123216 | elapsed time per iteration (ms): 15841.8 | learning rate: 3.411E-05 | global batch size:    48 | lm loss: 6.368502E+00 | loss scale: 8192.0 | grad norm: 103816.078 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5029/  159576 | consumed samples:       123264 | elapsed time per iteration (ms): 15474.5 | learning rate: 3.412E-05 | global batch size:    48 | lm loss: 6.350607E+00 | loss scale: 8192.0 | grad norm: 88025.860 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5030/  159576 | consumed samples:       123312 | elapsed time per iteration (ms): 15468.9 | learning rate: 3.413E-05 | global batch size:    48 | lm loss: 6.421462E+00 | loss scale: 8192.0 | grad norm: 121501.317 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5031/  159576 | consumed samples:       123360 | elapsed time per iteration (ms): 15894.7 | learning rate: 3.414E-05 | global batch size:    48 | lm loss: 6.452309E+00 | loss scale: 8192.0 | grad norm: 98299.215 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5032/  159576 | consumed samples:       123408 | elapsed time per iteration (ms): 15372.6 | learning rate: 3.416E-05 | global batch size:    48 | lm loss: 6.470865E+00 | loss scale: 8192.0 | grad norm: 86033.852 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5033/  159576 | consumed samples:       123456 | elapsed time per iteration (ms): 15386.4 | learning rate: 3.417E-05 | global batch size:    48 | lm loss: 6.358019E+00 | loss scale: 8192.0 | grad norm: 102254.964 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5034/  159576 | consumed samples:       123504 | elapsed time per iteration (ms): 15445.3 | learning rate: 3.418E-05 | global batch size:    48 | lm loss: 6.501051E+00 | loss scale: 8192.0 | grad norm: 106902.558 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5035/  159576 | consumed samples:       123552 | elapsed time per iteration (ms): 15687.1 | learning rate: 3.420E-05 | global batch size:    48 | lm loss: 6.441896E+00 | loss scale: 8192.0 | grad norm: 88100.171 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5036/  159576 | consumed samples:       123600 | elapsed time per iteration (ms): 15548.9 | learning rate: 3.421E-05 | global batch size:    48 | lm loss: 6.297223E+00 | loss scale: 8192.0 | grad norm: 92260.861 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5037/  159576 | consumed samples:       123648 | elapsed time per iteration (ms): 15475.3 | learning rate: 3.422E-05 | global batch size:    48 | lm loss: 6.382265E+00 | loss scale: 8192.0 | grad norm: 91449.043 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5038/  159576 | consumed samples:       123696 | elapsed time per iteration (ms): 15468.3 | learning rate: 3.424E-05 | global batch size:    48 | lm loss: 6.354884E+00 | loss scale: 8192.0 | grad norm: 112737.531 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5039/  159576 | consumed samples:       123744 | elapsed time per iteration (ms): 15758.7 | learning rate: 3.425E-05 | global batch size:    48 | lm loss: 6.504280E+00 | loss scale: 8192.0 | grad norm: 106073.818 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5040/  159576 | consumed samples:       123792 | elapsed time per iteration (ms): 15421.0 | learning rate: 3.426E-05 | global batch size:    48 | lm loss: 6.361072E+00 | loss scale: 8192.0 | grad norm: 127074.088 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5041/  159576 | consumed samples:       123840 | elapsed time per iteration (ms): 15385.1 | learning rate: 3.428E-05 | global batch size:    48 | lm loss: 6.289526E+00 | loss scale: 8192.0 | grad norm: 92444.062 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5042/  159576 | consumed samples:       123888 | elapsed time per iteration (ms): 15433.3 | learning rate: 3.429E-05 | global batch size:    48 | lm loss: 6.276048E+00 | loss scale: 8192.0 | grad norm: 95460.876 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5043/  159576 | consumed samples:       123936 | elapsed time per iteration (ms): 15839.0 | learning rate: 3.430E-05 | global batch size:    48 | lm loss: 6.447580E+00 | loss scale: 8192.0 | grad norm: 140216.976 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5044/  159576 | consumed samples:       123984 | elapsed time per iteration (ms): 15579.5 | learning rate: 3.432E-05 | global batch size:    48 | lm loss: 6.390550E+00 | loss scale: 8192.0 | grad norm: 103110.486 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5045/  159576 | consumed samples:       124032 | elapsed time per iteration (ms): 15508.8 | learning rate: 3.433E-05 | global batch size:    48 | lm loss: 6.326768E+00 | loss scale: 8192.0 | grad norm: 143773.143 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5046/  159576 | consumed samples:       124080 | elapsed time per iteration (ms): 15498.6 | learning rate: 3.434E-05 | global batch size:    48 | lm loss: 6.474419E+00 | loss scale: 8192.0 | grad norm: 112141.964 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5047/  159576 | consumed samples:       124128 | elapsed time per iteration (ms): 15657.7 | learning rate: 3.436E-05 | global batch size:    48 | lm loss: 6.411184E+00 | loss scale: 8192.0 | grad norm: 106306.407 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5048/  159576 | consumed samples:       124176 | elapsed time per iteration (ms): 15457.2 | learning rate: 3.437E-05 | global batch size:    48 | lm loss: 6.448883E+00 | loss scale: 8192.0 | grad norm: 119234.379 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5049/  159576 | consumed samples:       124224 | elapsed time per iteration (ms): 15413.6 | learning rate: 3.438E-05 | global batch size:    48 | lm loss: 6.307952E+00 | loss scale: 8192.0 | grad norm: 94509.642 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5050/  159576 | consumed samples:       124272 | elapsed time per iteration (ms): 15423.5 | learning rate: 3.440E-05 | global batch size:    48 | lm loss: 6.399596E+00 | loss scale: 8192.0 | grad norm: 107196.748 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5051/  159576 | consumed samples:       124320 | elapsed time per iteration (ms): 15555.5 | learning rate: 3.441E-05 | global batch size:    48 | lm loss: 6.345298E+00 | loss scale: 8192.0 | grad norm: 101445.269 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5052/  159576 | consumed samples:       124368 | elapsed time per iteration (ms): 15471.9 | learning rate: 3.442E-05 | global batch size:    48 | lm loss: 6.399672E+00 | loss scale: 8192.0 | grad norm: 101071.085 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5053/  159576 | consumed samples:       124416 | elapsed time per iteration (ms): 15538.7 | learning rate: 3.444E-05 | global batch size:    48 | lm loss: 6.306325E+00 | loss scale: 8192.0 | grad norm: 130980.614 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5054/  159576 | consumed samples:       124464 | elapsed time per iteration (ms): 15446.5 | learning rate: 3.445E-05 | global batch size:    48 | lm loss: 6.360683E+00 | loss scale: 8192.0 | grad norm: 138731.319 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5055/  159576 | consumed samples:       124512 | elapsed time per iteration (ms): 15548.6 | learning rate: 3.446E-05 | global batch size:    48 | lm loss: 6.415308E+00 | loss scale: 8192.0 | grad norm: 172722.048 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5056/  159576 | consumed samples:       124560 | elapsed time per iteration (ms): 15454.2 | learning rate: 3.448E-05 | global batch size:    48 | lm loss: 6.446492E+00 | loss scale: 8192.0 | grad norm: 114779.854 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5057/  159576 | consumed samples:       124608 | elapsed time per iteration (ms): 15531.5 | learning rate: 3.449E-05 | global batch size:    48 | lm loss: 6.352797E+00 | loss scale: 8192.0 | grad norm: 93911.343 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5058/  159576 | consumed samples:       124656 | elapsed time per iteration (ms): 15916.6 | learning rate: 3.450E-05 | global batch size:    48 | lm loss: 6.394308E+00 | loss scale: 8192.0 | grad norm: 122896.031 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5059/  159576 | consumed samples:       124704 | elapsed time per iteration (ms): 15639.0 | learning rate: 3.452E-05 | global batch size:    48 | lm loss: 6.497361E+00 | loss scale: 8192.0 | grad norm: 111301.411 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5060/  159576 | consumed samples:       124752 | elapsed time per iteration (ms): 15585.9 | learning rate: 3.453E-05 | global batch size:    48 | lm loss: 6.416485E+00 | loss scale: 8192.0 | grad norm: 111209.944 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5061/  159576 | consumed samples:       124800 | elapsed time per iteration (ms): 15476.2 | learning rate: 3.454E-05 | global batch size:    48 | lm loss: 6.385825E+00 | loss scale: 8192.0 | grad norm: 124134.940 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5062/  159576 | consumed samples:       124848 | elapsed time per iteration (ms): 15734.0 | learning rate: 3.456E-05 | global batch size:    48 | lm loss: 6.419828E+00 | loss scale: 8192.0 | grad norm: 115134.207 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5063/  159576 | consumed samples:       124896 | elapsed time per iteration (ms): 15427.5 | learning rate: 3.457E-05 | global batch size:    48 | lm loss: 6.501984E+00 | loss scale: 8192.0 | grad norm: 94348.909 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5064/  159576 | consumed samples:       124944 | elapsed time per iteration (ms): 15367.7 | learning rate: 3.458E-05 | global batch size:    48 | lm loss: 6.435040E+00 | loss scale: 8192.0 | grad norm: 107056.513 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5065/  159576 | consumed samples:       124992 | elapsed time per iteration (ms): 15376.7 | learning rate: 3.460E-05 | global batch size:    48 | lm loss: 6.347174E+00 | loss scale: 8192.0 | grad norm: 107513.355 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5066/  159576 | consumed samples:       125040 | elapsed time per iteration (ms): 15861.2 | learning rate: 3.461E-05 | global batch size:    48 | lm loss: 6.473555E+00 | loss scale: 8192.0 | grad norm: 96134.607 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5067/  159576 | consumed samples:       125088 | elapsed time per iteration (ms): 15376.8 | learning rate: 3.462E-05 | global batch size:    48 | lm loss: 6.364458E+00 | loss scale: 8192.0 | grad norm: 110987.016 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5068/  159576 | consumed samples:       125136 | elapsed time per iteration (ms): 15511.1 | learning rate: 3.464E-05 | global batch size:    48 | lm loss: 6.441058E+00 | loss scale: 8192.0 | grad norm: 135931.030 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5069/  159576 | consumed samples:       125184 | elapsed time per iteration (ms): 15475.4 | learning rate: 3.465E-05 | global batch size:    48 | lm loss: 6.324648E+00 | loss scale: 8192.0 | grad norm: 108716.373 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5070/  159576 | consumed samples:       125232 | elapsed time per iteration (ms): 15862.4 | learning rate: 3.466E-05 | global batch size:    48 | lm loss: 6.318436E+00 | loss scale: 8192.0 | grad norm: 103967.885 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5071/  159576 | consumed samples:       125280 | elapsed time per iteration (ms): 15504.6 | learning rate: 3.468E-05 | global batch size:    48 | lm loss: 6.395255E+00 | loss scale: 8192.0 | grad norm: 108399.090 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5072/  159576 | consumed samples:       125328 | elapsed time per iteration (ms): 15377.1 | learning rate: 3.469E-05 | global batch size:    48 | lm loss: 6.379922E+00 | loss scale: 8192.0 | grad norm: 103462.577 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5073/  159576 | consumed samples:       125376 | elapsed time per iteration (ms): 15411.3 | learning rate: 3.470E-05 | global batch size:    48 | lm loss: 6.396028E+00 | loss scale: 8192.0 | grad norm: 95480.077 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5074/  159576 | consumed samples:       125424 | elapsed time per iteration (ms): 15799.1 | learning rate: 3.472E-05 | global batch size:    48 | lm loss: 6.413391E+00 | loss scale: 8192.0 | grad norm: 150193.560 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5075/  159576 | consumed samples:       125472 | elapsed time per iteration (ms): 15638.7 | learning rate: 3.473E-05 | global batch size:    48 | lm loss: 6.308775E+00 | loss scale: 8192.0 | grad norm: 129289.081 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5076/  159576 | consumed samples:       125520 | elapsed time per iteration (ms): 15490.0 | learning rate: 3.474E-05 | global batch size:    48 | lm loss: 6.273424E+00 | loss scale: 8192.0 | grad norm: 137408.576 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5077/  159576 | consumed samples:       125568 | elapsed time per iteration (ms): 15408.8 | learning rate: 3.476E-05 | global batch size:    48 | lm loss: 6.402836E+00 | loss scale: 8192.0 | grad norm: 549435.371 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5078/  159576 | consumed samples:       125616 | elapsed time per iteration (ms): 15586.3 | learning rate: 3.477E-05 | global batch size:    48 | lm loss: 6.309762E+00 | loss scale: 8192.0 | grad norm: 104483.336 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5079/  159576 | consumed samples:       125664 | elapsed time per iteration (ms): 15542.8 | learning rate: 3.478E-05 | global batch size:    48 | lm loss: 6.315629E+00 | loss scale: 8192.0 | grad norm: 91616.745 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5080/  159576 | consumed samples:       125712 | elapsed time per iteration (ms): 15472.1 | learning rate: 3.480E-05 | global batch size:    48 | lm loss: 6.554045E+00 | loss scale: 8192.0 | grad norm: 172370.169 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5081/  159576 | consumed samples:       125760 | elapsed time per iteration (ms): 15563.9 | learning rate: 3.481E-05 | global batch size:    48 | lm loss: 6.355201E+00 | loss scale: 8192.0 | grad norm: 125519.190 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5082/  159576 | consumed samples:       125808 | elapsed time per iteration (ms): 15777.1 | learning rate: 3.482E-05 | global batch size:    48 | lm loss: 6.435748E+00 | loss scale: 8192.0 | grad norm: 122698.397 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5083/  159576 | consumed samples:       125856 | elapsed time per iteration (ms): 15566.4 | learning rate: 3.484E-05 | global batch size:    48 | lm loss: 6.269705E+00 | loss scale: 8192.0 | grad norm: 120100.832 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5084/  159576 | consumed samples:       125904 | elapsed time per iteration (ms): 15633.9 | learning rate: 3.485E-05 | global batch size:    48 | lm loss: 6.357334E+00 | loss scale: 8192.0 | grad norm: 98996.539 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5085/  159576 | consumed samples:       125952 | elapsed time per iteration (ms): 15985.6 | learning rate: 3.486E-05 | global batch size:    48 | lm loss: 6.393430E+00 | loss scale: 8192.0 | grad norm: 96935.838 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5086/  159576 | consumed samples:       126000 | elapsed time per iteration (ms): 15483.1 | learning rate: 3.488E-05 | global batch size:    48 | lm loss: 6.307817E+00 | loss scale: 8192.0 | grad norm: 105392.466 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5087/  159576 | consumed samples:       126048 | elapsed time per iteration (ms): 15492.6 | learning rate: 3.489E-05 | global batch size:    48 | lm loss: 6.307018E+00 | loss scale: 8192.0 | grad norm: 119838.229 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5088/  159576 | consumed samples:       126096 | elapsed time per iteration (ms): 15510.3 | learning rate: 3.490E-05 | global batch size:    48 | lm loss: 6.400391E+00 | loss scale: 8192.0 | grad norm: 124265.428 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5089/  159576 | consumed samples:       126144 | elapsed time per iteration (ms): 15885.9 | learning rate: 3.492E-05 | global batch size:    48 | lm loss: 6.333194E+00 | loss scale: 8192.0 | grad norm: 115702.613 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5090/  159576 | consumed samples:       126192 | elapsed time per iteration (ms): 15544.2 | learning rate: 3.493E-05 | global batch size:    48 | lm loss: 6.331620E+00 | loss scale: 8192.0 | grad norm: 137239.041 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5091/  159576 | consumed samples:       126240 | elapsed time per iteration (ms): 15557.8 | learning rate: 3.494E-05 | global batch size:    48 | lm loss: 6.437903E+00 | loss scale: 8192.0 | grad norm: 233688.200 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5092/  159576 | consumed samples:       126288 | elapsed time per iteration (ms): 15511.8 | learning rate: 3.496E-05 | global batch size:    48 | lm loss: 6.421580E+00 | loss scale: 8192.0 | grad norm: 127898.243 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5093/  159576 | consumed samples:       126336 | elapsed time per iteration (ms): 16146.9 | learning rate: 3.497E-05 | global batch size:    48 | lm loss: 6.348750E+00 | loss scale: 8192.0 | grad norm: 200287.595 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5094/  159576 | consumed samples:       126384 | elapsed time per iteration (ms): 15650.7 | learning rate: 3.498E-05 | global batch size:    48 | lm loss: 6.384042E+00 | loss scale: 8192.0 | grad norm: 141808.196 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5095/  159576 | consumed samples:       126432 | elapsed time per iteration (ms): 15549.8 | learning rate: 3.500E-05 | global batch size:    48 | lm loss: 6.380728E+00 | loss scale: 8192.0 | grad norm: 113750.693 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5096/  159576 | consumed samples:       126480 | elapsed time per iteration (ms): 15494.8 | learning rate: 3.501E-05 | global batch size:    48 | lm loss: 6.329007E+00 | loss scale: 8192.0 | grad norm: 142607.603 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5097/  159576 | consumed samples:       126528 | elapsed time per iteration (ms): 15805.4 | learning rate: 3.502E-05 | global batch size:    48 | lm loss: 6.331810E+00 | loss scale: 8192.0 | grad norm: 125989.326 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5098/  159576 | consumed samples:       126576 | elapsed time per iteration (ms): 15560.8 | learning rate: 3.504E-05 | global batch size:    48 | lm loss: 6.349818E+00 | loss scale: 8192.0 | grad norm: 164955.758 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5099/  159576 | consumed samples:       126624 | elapsed time per iteration (ms): 15574.8 | learning rate: 3.505E-05 | global batch size:    48 | lm loss: 6.511029E+00 | loss scale: 8192.0 | grad norm: 150219.938 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5100/  159576 | consumed samples:       126672 | elapsed time per iteration (ms): 15588.9 | learning rate: 3.506E-05 | global batch size:    48 | lm loss: 6.365673E+00 | loss scale: 8192.0 | grad norm: 132801.144 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5101/  159576 | consumed samples:       126720 | elapsed time per iteration (ms): 15620.0 | learning rate: 3.508E-05 | global batch size:    48 | lm loss: 6.393438E+00 | loss scale: 8192.0 | grad norm: 181251.963 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5102/  159576 | consumed samples:       126768 | elapsed time per iteration (ms): 15489.4 | learning rate: 3.509E-05 | global batch size:    48 | lm loss: 6.416411E+00 | loss scale: 8192.0 | grad norm: 117102.206 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5103/  159576 | consumed samples:       126816 | elapsed time per iteration (ms): 15557.2 | learning rate: 3.510E-05 | global batch size:    48 | lm loss: 6.328413E+00 | loss scale: 8192.0 | grad norm: 187671.141 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5104/  159576 | consumed samples:       126864 | elapsed time per iteration (ms): 15527.6 | learning rate: 3.512E-05 | global batch size:    48 | lm loss: 6.465903E+00 | loss scale: 8192.0 | grad norm: 190613.302 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5105/  159576 | consumed samples:       126912 | elapsed time per iteration (ms): 8977.0 | learning rate: 3.512E-05 | global batch size:    48 | lm loss: 6.508333E+00 | loss scale: 4096.0 | grad norm: 190613.302 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5106/  159576 | consumed samples:       126960 | elapsed time per iteration (ms): 15010.8 | learning rate: 3.513E-05 | global batch size:    48 | lm loss: 6.436017E+00 | loss scale: 4096.0 | grad norm: 59199.826 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5107/  159576 | consumed samples:       127008 | elapsed time per iteration (ms): 15527.1 | learning rate: 3.514E-05 | global batch size:    48 | lm loss: 6.357530E+00 | loss scale: 4096.0 | grad norm: 72710.163 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5108/  159576 | consumed samples:       127056 | elapsed time per iteration (ms): 15496.3 | learning rate: 3.516E-05 | global batch size:    48 | lm loss: 6.394055E+00 | loss scale: 4096.0 | grad norm: 94748.377 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5109/  159576 | consumed samples:       127104 | elapsed time per iteration (ms): 15957.2 | learning rate: 3.517E-05 | global batch size:    48 | lm loss: 6.443262E+00 | loss scale: 4096.0 | grad norm: 61224.800 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5110/  159576 | consumed samples:       127152 | elapsed time per iteration (ms): 15587.8 | learning rate: 3.518E-05 | global batch size:    48 | lm loss: 6.400789E+00 | loss scale: 4096.0 | grad norm: 97179.001 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5111/  159576 | consumed samples:       127200 | elapsed time per iteration (ms): 15522.6 | learning rate: 3.520E-05 | global batch size:    48 | lm loss: 6.368151E+00 | loss scale: 4096.0 | grad norm: 103211.934 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5112/  159576 | consumed samples:       127248 | elapsed time per iteration (ms): 15555.5 | learning rate: 3.521E-05 | global batch size:    48 | lm loss: 6.389073E+00 | loss scale: 4096.0 | grad norm: 68143.489 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5113/  159576 | consumed samples:       127296 | elapsed time per iteration (ms): 15672.8 | learning rate: 3.522E-05 | global batch size:    48 | lm loss: 6.453850E+00 | loss scale: 4096.0 | grad norm: 80102.261 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5114/  159576 | consumed samples:       127344 | elapsed time per iteration (ms): 15462.8 | learning rate: 3.524E-05 | global batch size:    48 | lm loss: 6.448624E+00 | loss scale: 4096.0 | grad norm: 79184.876 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5115/  159576 | consumed samples:       127392 | elapsed time per iteration (ms): 15488.2 | learning rate: 3.525E-05 | global batch size:    48 | lm loss: 6.440034E+00 | loss scale: 4096.0 | grad norm: 65278.408 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5116/  159576 | consumed samples:       127440 | elapsed time per iteration (ms): 15517.5 | learning rate: 3.526E-05 | global batch size:    48 | lm loss: 6.452240E+00 | loss scale: 4096.0 | grad norm: 81154.588 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5117/  159576 | consumed samples:       127488 | elapsed time per iteration (ms): 15650.3 | learning rate: 3.528E-05 | global batch size:    48 | lm loss: 6.352810E+00 | loss scale: 4096.0 | grad norm: 70667.188 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5118/  159576 | consumed samples:       127536 | elapsed time per iteration (ms): 15553.2 | learning rate: 3.529E-05 | global batch size:    48 | lm loss: 6.422338E+00 | loss scale: 4096.0 | grad norm: 76003.454 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5119/  159576 | consumed samples:       127584 | elapsed time per iteration (ms): 15525.1 | learning rate: 3.530E-05 | global batch size:    48 | lm loss: 6.345719E+00 | loss scale: 4096.0 | grad norm: 75153.995 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5120/  159576 | consumed samples:       127632 | elapsed time per iteration (ms): 15941.5 | learning rate: 3.532E-05 | global batch size:    48 | lm loss: 6.406080E+00 | loss scale: 4096.0 | grad norm: 61393.266 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5121/  159576 | consumed samples:       127680 | elapsed time per iteration (ms): 15581.4 | learning rate: 3.533E-05 | global batch size:    48 | lm loss: 6.333064E+00 | loss scale: 4096.0 | grad norm: 84273.861 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5122/  159576 | consumed samples:       127728 | elapsed time per iteration (ms): 15534.4 | learning rate: 3.534E-05 | global batch size:    48 | lm loss: 6.430450E+00 | loss scale: 4096.0 | grad norm: 71025.296 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5123/  159576 | consumed samples:       127776 | elapsed time per iteration (ms): 15491.5 | learning rate: 3.536E-05 | global batch size:    48 | lm loss: 6.372457E+00 | loss scale: 4096.0 | grad norm: 60958.441 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5124/  159576 | consumed samples:       127824 | elapsed time per iteration (ms): 15825.8 | learning rate: 3.537E-05 | global batch size:    48 | lm loss: 6.359689E+00 | loss scale: 4096.0 | grad norm: 69184.583 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5125/  159576 | consumed samples:       127872 | elapsed time per iteration (ms): 15572.0 | learning rate: 3.538E-05 | global batch size:    48 | lm loss: 6.354432E+00 | loss scale: 4096.0 | grad norm: 81726.924 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5126/  159576 | consumed samples:       127920 | elapsed time per iteration (ms): 15546.1 | learning rate: 3.540E-05 | global batch size:    48 | lm loss: 6.383263E+00 | loss scale: 4096.0 | grad norm: 67932.048 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5127/  159576 | consumed samples:       127968 | elapsed time per iteration (ms): 15512.5 | learning rate: 3.541E-05 | global batch size:    48 | lm loss: 6.323973E+00 | loss scale: 4096.0 | grad norm: 69551.089 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5128/  159576 | consumed samples:       128016 | elapsed time per iteration (ms): 15872.2 | learning rate: 3.542E-05 | global batch size:    48 | lm loss: 6.384116E+00 | loss scale: 4096.0 | grad norm: 66160.808 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5129/  159576 | consumed samples:       128064 | elapsed time per iteration (ms): 15540.5 | learning rate: 3.544E-05 | global batch size:    48 | lm loss: 6.273410E+00 | loss scale: 4096.0 | grad norm: 68712.154 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5130/  159576 | consumed samples:       128112 | elapsed time per iteration (ms): 15510.9 | learning rate: 3.545E-05 | global batch size:    48 | lm loss: 6.393827E+00 | loss scale: 4096.0 | grad norm: 80347.476 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5131/  159576 | consumed samples:       128160 | elapsed time per iteration (ms): 15546.9 | learning rate: 3.546E-05 | global batch size:    48 | lm loss: 6.494912E+00 | loss scale: 4096.0 | grad norm: 79601.794 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5132/  159576 | consumed samples:       128208 | elapsed time per iteration (ms): 15850.8 | learning rate: 3.548E-05 | global batch size:    48 | lm loss: 6.363180E+00 | loss scale: 4096.0 | grad norm: 59957.448 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5133/  159576 | consumed samples:       128256 | elapsed time per iteration (ms): 15572.0 | learning rate: 3.549E-05 | global batch size:    48 | lm loss: 6.361386E+00 | loss scale: 4096.0 | grad norm: 65589.795 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5134/  159576 | consumed samples:       128304 | elapsed time per iteration (ms): 15554.8 | learning rate: 3.550E-05 | global batch size:    48 | lm loss: 6.338229E+00 | loss scale: 4096.0 | grad norm: 70953.865 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5135/  159576 | consumed samples:       128352 | elapsed time per iteration (ms): 15508.1 | learning rate: 3.552E-05 | global batch size:    48 | lm loss: 6.265258E+00 | loss scale: 4096.0 | grad norm: 101476.397 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5136/  159576 | consumed samples:       128400 | elapsed time per iteration (ms): 15713.9 | learning rate: 3.553E-05 | global batch size:    48 | lm loss: 6.443205E+00 | loss scale: 4096.0 | grad norm: 70676.423 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5137/  159576 | consumed samples:       128448 | elapsed time per iteration (ms): 15500.3 | learning rate: 3.554E-05 | global batch size:    48 | lm loss: 6.297948E+00 | loss scale: 4096.0 | grad norm: 50734.773 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5138/  159576 | consumed samples:       128496 | elapsed time per iteration (ms): 15505.3 | learning rate: 3.556E-05 | global batch size:    48 | lm loss: 6.343609E+00 | loss scale: 4096.0 | grad norm: 67207.942 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5139/  159576 | consumed samples:       128544 | elapsed time per iteration (ms): 15531.1 | learning rate: 3.557E-05 | global batch size:    48 | lm loss: 6.422406E+00 | loss scale: 4096.0 | grad norm: 50444.513 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5140/  159576 | consumed samples:       128592 | elapsed time per iteration (ms): 15679.9 | learning rate: 3.558E-05 | global batch size:    48 | lm loss: 6.377341E+00 | loss scale: 4096.0 | grad norm: 71866.018 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5141/  159576 | consumed samples:       128640 | elapsed time per iteration (ms): 15549.3 | learning rate: 3.560E-05 | global batch size:    48 | lm loss: 6.403359E+00 | loss scale: 4096.0 | grad norm: 64942.411 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5142/  159576 | consumed samples:       128688 | elapsed time per iteration (ms): 15525.2 | learning rate: 3.561E-05 | global batch size:    48 | lm loss: 6.390831E+00 | loss scale: 4096.0 | grad norm: 66674.644 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5143/  159576 | consumed samples:       128736 | elapsed time per iteration (ms): 15540.8 | learning rate: 3.562E-05 | global batch size:    48 | lm loss: 6.391725E+00 | loss scale: 4096.0 | grad norm: 59980.301 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5144/  159576 | consumed samples:       128784 | elapsed time per iteration (ms): 15885.0 | learning rate: 3.564E-05 | global batch size:    48 | lm loss: 6.459509E+00 | loss scale: 4096.0 | grad norm: 136366.609 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5145/  159576 | consumed samples:       128832 | elapsed time per iteration (ms): 15452.0 | learning rate: 3.565E-05 | global batch size:    48 | lm loss: 6.528796E+00 | loss scale: 4096.0 | grad norm: 82183.349 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5146/  159576 | consumed samples:       128880 | elapsed time per iteration (ms): 15509.1 | learning rate: 3.566E-05 | global batch size:    48 | lm loss: 6.420625E+00 | loss scale: 4096.0 | grad norm: 69812.351 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5147/  159576 | consumed samples:       128928 | elapsed time per iteration (ms): 15918.9 | learning rate: 3.568E-05 | global batch size:    48 | lm loss: 6.436305E+00 | loss scale: 4096.0 | grad norm: 63955.498 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5148/  159576 | consumed samples:       128976 | elapsed time per iteration (ms): 15526.4 | learning rate: 3.569E-05 | global batch size:    48 | lm loss: 6.339918E+00 | loss scale: 4096.0 | grad norm: 56857.758 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5149/  159576 | consumed samples:       129024 | elapsed time per iteration (ms): 15529.0 | learning rate: 3.570E-05 | global batch size:    48 | lm loss: 6.345021E+00 | loss scale: 4096.0 | grad norm: 93115.718 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5150/  159576 | consumed samples:       129072 | elapsed time per iteration (ms): 15542.6 | learning rate: 3.572E-05 | global batch size:    48 | lm loss: 6.311335E+00 | loss scale: 4096.0 | grad norm: 61629.742 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5151/  159576 | consumed samples:       129120 | elapsed time per iteration (ms): 15904.0 | learning rate: 3.573E-05 | global batch size:    48 | lm loss: 6.397278E+00 | loss scale: 4096.0 | grad norm: 65208.827 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5152/  159576 | consumed samples:       129168 | elapsed time per iteration (ms): 15450.1 | learning rate: 3.574E-05 | global batch size:    48 | lm loss: 6.345972E+00 | loss scale: 4096.0 | grad norm: 72003.182 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5153/  159576 | consumed samples:       129216 | elapsed time per iteration (ms): 15533.3 | learning rate: 3.576E-05 | global batch size:    48 | lm loss: 6.411428E+00 | loss scale: 4096.0 | grad norm: 105237.969 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5154/  159576 | consumed samples:       129264 | elapsed time per iteration (ms): 15505.2 | learning rate: 3.577E-05 | global batch size:    48 | lm loss: 6.320354E+00 | loss scale: 4096.0 | grad norm: 101458.750 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5155/  159576 | consumed samples:       129312 | elapsed time per iteration (ms): 15994.4 | learning rate: 3.578E-05 | global batch size:    48 | lm loss: 6.453386E+00 | loss scale: 4096.0 | grad norm: 118215.611 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5156/  159576 | consumed samples:       129360 | elapsed time per iteration (ms): 15565.8 | learning rate: 3.580E-05 | global batch size:    48 | lm loss: 6.443649E+00 | loss scale: 4096.0 | grad norm: 72691.423 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5157/  159576 | consumed samples:       129408 | elapsed time per iteration (ms): 15539.2 | learning rate: 3.581E-05 | global batch size:    48 | lm loss: 6.528984E+00 | loss scale: 4096.0 | grad norm: 72165.791 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5158/  159576 | consumed samples:       129456 | elapsed time per iteration (ms): 15536.3 | learning rate: 3.582E-05 | global batch size:    48 | lm loss: 6.398818E+00 | loss scale: 4096.0 | grad norm: 69046.921 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5159/  159576 | consumed samples:       129504 | elapsed time per iteration (ms): 15739.5 | learning rate: 3.584E-05 | global batch size:    48 | lm loss: 6.384636E+00 | loss scale: 4096.0 | grad norm: 65721.319 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5160/  159576 | consumed samples:       129552 | elapsed time per iteration (ms): 15530.3 | learning rate: 3.585E-05 | global batch size:    48 | lm loss: 6.340583E+00 | loss scale: 4096.0 | grad norm: 70984.261 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5161/  159576 | consumed samples:       129600 | elapsed time per iteration (ms): 15537.1 | learning rate: 3.586E-05 | global batch size:    48 | lm loss: 6.299366E+00 | loss scale: 4096.0 | grad norm: 120531.429 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5162/  159576 | consumed samples:       129648 | elapsed time per iteration (ms): 15525.1 | learning rate: 3.588E-05 | global batch size:    48 | lm loss: 6.422726E+00 | loss scale: 4096.0 | grad norm: 80943.603 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5163/  159576 | consumed samples:       129696 | elapsed time per iteration (ms): 15737.7 | learning rate: 3.589E-05 | global batch size:    48 | lm loss: 6.343781E+00 | loss scale: 4096.0 | grad norm: 62800.221 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5164/  159576 | consumed samples:       129744 | elapsed time per iteration (ms): 15570.2 | learning rate: 3.590E-05 | global batch size:    48 | lm loss: 6.478961E+00 | loss scale: 4096.0 | grad norm: 49279.442 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5165/  159576 | consumed samples:       129792 | elapsed time per iteration (ms): 15467.9 | learning rate: 3.592E-05 | global batch size:    48 | lm loss: 6.465704E+00 | loss scale: 4096.0 | grad norm: 56608.697 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5166/  159576 | consumed samples:       129840 | elapsed time per iteration (ms): 15511.0 | learning rate: 3.593E-05 | global batch size:    48 | lm loss: 6.389446E+00 | loss scale: 4096.0 | grad norm: 64287.210 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5167/  159576 | consumed samples:       129888 | elapsed time per iteration (ms): 15650.0 | learning rate: 3.594E-05 | global batch size:    48 | lm loss: 6.432152E+00 | loss scale: 4096.0 | grad norm: 68389.100 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5168/  159576 | consumed samples:       129936 | elapsed time per iteration (ms): 15501.5 | learning rate: 3.596E-05 | global batch size:    48 | lm loss: 6.311705E+00 | loss scale: 4096.0 | grad norm: 60127.301 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5169/  159576 | consumed samples:       129984 | elapsed time per iteration (ms): 15500.0 | learning rate: 3.597E-05 | global batch size:    48 | lm loss: 6.459386E+00 | loss scale: 4096.0 | grad norm: 193850.992 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5170/  159576 | consumed samples:       130032 | elapsed time per iteration (ms): 15853.5 | learning rate: 3.598E-05 | global batch size:    48 | lm loss: 6.359794E+00 | loss scale: 4096.0 | grad norm: 201400.324 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5171/  159576 | consumed samples:       130080 | elapsed time per iteration (ms): 15565.6 | learning rate: 3.600E-05 | global batch size:    48 | lm loss: 6.447841E+00 | loss scale: 4096.0 | grad norm: 60758.011 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5172/  159576 | consumed samples:       130128 | elapsed time per iteration (ms): 15439.0 | learning rate: 3.601E-05 | global batch size:    48 | lm loss: 6.390144E+00 | loss scale: 4096.0 | grad norm: 60173.953 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5173/  159576 | consumed samples:       130176 | elapsed time per iteration (ms): 15512.4 | learning rate: 3.602E-05 | global batch size:    48 | lm loss: 6.471553E+00 | loss scale: 4096.0 | grad norm: 65209.828 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5174/  159576 | consumed samples:       130224 | elapsed time per iteration (ms): 15753.1 | learning rate: 3.604E-05 | global batch size:    48 | lm loss: 6.363354E+00 | loss scale: 4096.0 | grad norm: 66471.065 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5175/  159576 | consumed samples:       130272 | elapsed time per iteration (ms): 15415.5 | learning rate: 3.605E-05 | global batch size:    48 | lm loss: 6.418964E+00 | loss scale: 4096.0 | grad norm: 63654.751 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5176/  159576 | consumed samples:       130320 | elapsed time per iteration (ms): 15469.1 | learning rate: 3.606E-05 | global batch size:    48 | lm loss: 6.357801E+00 | loss scale: 4096.0 | grad norm: 82288.957 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5177/  159576 | consumed samples:       130368 | elapsed time per iteration (ms): 15407.1 | learning rate: 3.608E-05 | global batch size:    48 | lm loss: 6.479723E+00 | loss scale: 4096.0 | grad norm: 63508.625 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5178/  159576 | consumed samples:       130416 | elapsed time per iteration (ms): 15785.1 | learning rate: 3.609E-05 | global batch size:    48 | lm loss: 6.532706E+00 | loss scale: 4096.0 | grad norm: 62734.072 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5179/  159576 | consumed samples:       130464 | elapsed time per iteration (ms): 15467.8 | learning rate: 3.610E-05 | global batch size:    48 | lm loss: 6.442670E+00 | loss scale: 4096.0 | grad norm: 64963.382 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5180/  159576 | consumed samples:       130512 | elapsed time per iteration (ms): 15479.5 | learning rate: 3.612E-05 | global batch size:    48 | lm loss: 6.373410E+00 | loss scale: 4096.0 | grad norm: 62492.194 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5181/  159576 | consumed samples:       130560 | elapsed time per iteration (ms): 15413.5 | learning rate: 3.613E-05 | global batch size:    48 | lm loss: 6.442731E+00 | loss scale: 4096.0 | grad norm: 93654.611 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5182/  159576 | consumed samples:       130608 | elapsed time per iteration (ms): 15788.0 | learning rate: 3.614E-05 | global batch size:    48 | lm loss: 6.356236E+00 | loss scale: 4096.0 | grad norm: 77133.068 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5183/  159576 | consumed samples:       130656 | elapsed time per iteration (ms): 15436.5 | learning rate: 3.616E-05 | global batch size:    48 | lm loss: 6.321268E+00 | loss scale: 4096.0 | grad norm: 138010.507 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5184/  159576 | consumed samples:       130704 | elapsed time per iteration (ms): 15417.0 | learning rate: 3.617E-05 | global batch size:    48 | lm loss: 6.463357E+00 | loss scale: 4096.0 | grad norm: 67977.572 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5185/  159576 | consumed samples:       130752 | elapsed time per iteration (ms): 15399.1 | learning rate: 3.618E-05 | global batch size:    48 | lm loss: 6.369720E+00 | loss scale: 4096.0 | grad norm: 73939.997 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5186/  159576 | consumed samples:       130800 | elapsed time per iteration (ms): 15682.4 | learning rate: 3.620E-05 | global batch size:    48 | lm loss: 6.404753E+00 | loss scale: 4096.0 | grad norm: 71441.970 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5187/  159576 | consumed samples:       130848 | elapsed time per iteration (ms): 15500.0 | learning rate: 3.621E-05 | global batch size:    48 | lm loss: 6.418368E+00 | loss scale: 4096.0 | grad norm: 85130.256 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5188/  159576 | consumed samples:       130896 | elapsed time per iteration (ms): 15437.0 | learning rate: 3.622E-05 | global batch size:    48 | lm loss: 6.391647E+00 | loss scale: 4096.0 | grad norm: 66283.229 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5189/  159576 | consumed samples:       130944 | elapsed time per iteration (ms): 15475.7 | learning rate: 3.624E-05 | global batch size:    48 | lm loss: 6.322616E+00 | loss scale: 4096.0 | grad norm: 75047.649 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5190/  159576 | consumed samples:       130992 | elapsed time per iteration (ms): 15579.8 | learning rate: 3.625E-05 | global batch size:    48 | lm loss: 6.431418E+00 | loss scale: 4096.0 | grad norm: 58908.817 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5191/  159576 | consumed samples:       131040 | elapsed time per iteration (ms): 15429.7 | learning rate: 3.626E-05 | global batch size:    48 | lm loss: 6.535919E+00 | loss scale: 4096.0 | grad norm: 122859.857 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5192/  159576 | consumed samples:       131088 | elapsed time per iteration (ms): 15437.2 | learning rate: 3.628E-05 | global batch size:    48 | lm loss: 6.220134E+00 | loss scale: 4096.0 | grad norm: 92437.561 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5193/  159576 | consumed samples:       131136 | elapsed time per iteration (ms): 15429.8 | learning rate: 3.629E-05 | global batch size:    48 | lm loss: 6.373948E+00 | loss scale: 4096.0 | grad norm: 93116.737 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5194/  159576 | consumed samples:       131184 | elapsed time per iteration (ms): 15588.8 | learning rate: 3.630E-05 | global batch size:    48 | lm loss: 6.390661E+00 | loss scale: 4096.0 | grad norm: 64520.956 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5195/  159576 | consumed samples:       131232 | elapsed time per iteration (ms): 15414.6 | learning rate: 3.632E-05 | global batch size:    48 | lm loss: 6.359470E+00 | loss scale: 4096.0 | grad norm: 61039.424 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5196/  159576 | consumed samples:       131280 | elapsed time per iteration (ms): 15469.0 | learning rate: 3.633E-05 | global batch size:    48 | lm loss: 6.426967E+00 | loss scale: 4096.0 | grad norm: 69860.175 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5197/  159576 | consumed samples:       131328 | elapsed time per iteration (ms): 15399.3 | learning rate: 3.634E-05 | global batch size:    48 | lm loss: 6.397369E+00 | loss scale: 4096.0 | grad norm: 67025.925 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5198/  159576 | consumed samples:       131376 | elapsed time per iteration (ms): 15852.9 | learning rate: 3.636E-05 | global batch size:    48 | lm loss: 6.470811E+00 | loss scale: 4096.0 | grad norm: 94172.614 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5199/  159576 | consumed samples:       131424 | elapsed time per iteration (ms): 15428.8 | learning rate: 3.637E-05 | global batch size:    48 | lm loss: 6.341267E+00 | loss scale: 4096.0 | grad norm: 73918.814 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5200/  159576 | consumed samples:       131472 | elapsed time per iteration (ms): 15444.1 | learning rate: 3.638E-05 | global batch size:    48 | lm loss: 6.434019E+00 | loss scale: 4096.0 | grad norm: 107373.139 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5201/  159576 | consumed samples:       131520 | elapsed time per iteration (ms): 15807.8 | learning rate: 3.639E-05 | global batch size:    48 | lm loss: 6.288959E+00 | loss scale: 4096.0 | grad norm: 60538.434 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5202/  159576 | consumed samples:       131568 | elapsed time per iteration (ms): 15428.1 | learning rate: 3.641E-05 | global batch size:    48 | lm loss: 6.382991E+00 | loss scale: 4096.0 | grad norm: 87744.726 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5203/  159576 | consumed samples:       131616 | elapsed time per iteration (ms): 15473.7 | learning rate: 3.642E-05 | global batch size:    48 | lm loss: 6.421006E+00 | loss scale: 4096.0 | grad norm: 63743.211 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5204/  159576 | consumed samples:       131664 | elapsed time per iteration (ms): 15342.5 | learning rate: 3.643E-05 | global batch size:    48 | lm loss: 6.345580E+00 | loss scale: 4096.0 | grad norm: 83317.459 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5205/  159576 | consumed samples:       131712 | elapsed time per iteration (ms): 15751.6 | learning rate: 3.645E-05 | global batch size:    48 | lm loss: 6.379266E+00 | loss scale: 4096.0 | grad norm: 72285.964 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5206/  159576 | consumed samples:       131760 | elapsed time per iteration (ms): 15391.2 | learning rate: 3.646E-05 | global batch size:    48 | lm loss: 6.296494E+00 | loss scale: 4096.0 | grad norm: 99774.130 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5207/  159576 | consumed samples:       131808 | elapsed time per iteration (ms): 15463.8 | learning rate: 3.647E-05 | global batch size:    48 | lm loss: 6.419320E+00 | loss scale: 4096.0 | grad norm: 76787.605 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5208/  159576 | consumed samples:       131856 | elapsed time per iteration (ms): 15457.9 | learning rate: 3.649E-05 | global batch size:    48 | lm loss: 6.321754E+00 | loss scale: 4096.0 | grad norm: 71044.606 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5209/  159576 | consumed samples:       131904 | elapsed time per iteration (ms): 15812.3 | learning rate: 3.650E-05 | global batch size:    48 | lm loss: 6.295812E+00 | loss scale: 4096.0 | grad norm: 80278.535 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5210/  159576 | consumed samples:       131952 | elapsed time per iteration (ms): 15416.3 | learning rate: 3.651E-05 | global batch size:    48 | lm loss: 6.444015E+00 | loss scale: 4096.0 | grad norm: 69086.077 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5211/  159576 | consumed samples:       132000 | elapsed time per iteration (ms): 15496.5 | learning rate: 3.653E-05 | global batch size:    48 | lm loss: 6.426943E+00 | loss scale: 4096.0 | grad norm: 87922.534 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5212/  159576 | consumed samples:       132048 | elapsed time per iteration (ms): 15327.0 | learning rate: 3.654E-05 | global batch size:    48 | lm loss: 6.361041E+00 | loss scale: 4096.0 | grad norm: 68686.112 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5213/  159576 | consumed samples:       132096 | elapsed time per iteration (ms): 15936.5 | learning rate: 3.655E-05 | global batch size:    48 | lm loss: 6.389860E+00 | loss scale: 4096.0 | grad norm: 68529.242 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5214/  159576 | consumed samples:       132144 | elapsed time per iteration (ms): 15542.2 | learning rate: 3.657E-05 | global batch size:    48 | lm loss: 6.395509E+00 | loss scale: 4096.0 | grad norm: 66332.216 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5215/  159576 | consumed samples:       132192 | elapsed time per iteration (ms): 15481.3 | learning rate: 3.658E-05 | global batch size:    48 | lm loss: 6.378184E+00 | loss scale: 4096.0 | grad norm: 69005.077 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5216/  159576 | consumed samples:       132240 | elapsed time per iteration (ms): 15471.0 | learning rate: 3.659E-05 | global batch size:    48 | lm loss: 6.409903E+00 | loss scale: 4096.0 | grad norm: 78238.545 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5217/  159576 | consumed samples:       132288 | elapsed time per iteration (ms): 15765.5 | learning rate: 3.661E-05 | global batch size:    48 | lm loss: 6.468248E+00 | loss scale: 4096.0 | grad norm: 81260.175 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5218/  159576 | consumed samples:       132336 | elapsed time per iteration (ms): 15514.7 | learning rate: 3.662E-05 | global batch size:    48 | lm loss: 6.462075E+00 | loss scale: 4096.0 | grad norm: 89591.763 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5219/  159576 | consumed samples:       132384 | elapsed time per iteration (ms): 15488.0 | learning rate: 3.663E-05 | global batch size:    48 | lm loss: 6.402821E+00 | loss scale: 4096.0 | grad norm: 67243.019 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5220/  159576 | consumed samples:       132432 | elapsed time per iteration (ms): 15443.2 | learning rate: 3.665E-05 | global batch size:    48 | lm loss: 6.377299E+00 | loss scale: 4096.0 | grad norm: 73909.640 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5221/  159576 | consumed samples:       132480 | elapsed time per iteration (ms): 15695.0 | learning rate: 3.666E-05 | global batch size:    48 | lm loss: 6.451472E+00 | loss scale: 4096.0 | grad norm: 66658.049 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5222/  159576 | consumed samples:       132528 | elapsed time per iteration (ms): 15480.5 | learning rate: 3.667E-05 | global batch size:    48 | lm loss: 6.465474E+00 | loss scale: 4096.0 | grad norm: 71303.345 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5223/  159576 | consumed samples:       132576 | elapsed time per iteration (ms): 15538.4 | learning rate: 3.669E-05 | global batch size:    48 | lm loss: 6.452018E+00 | loss scale: 4096.0 | grad norm: 61632.620 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5224/  159576 | consumed samples:       132624 | elapsed time per iteration (ms): 15433.6 | learning rate: 3.670E-05 | global batch size:    48 | lm loss: 6.417565E+00 | loss scale: 4096.0 | grad norm: 99052.706 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5225/  159576 | consumed samples:       132672 | elapsed time per iteration (ms): 16019.0 | learning rate: 3.671E-05 | global batch size:    48 | lm loss: 6.392467E+00 | loss scale: 4096.0 | grad norm: 81901.168 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5226/  159576 | consumed samples:       132720 | elapsed time per iteration (ms): 15479.0 | learning rate: 3.673E-05 | global batch size:    48 | lm loss: 6.432102E+00 | loss scale: 4096.0 | grad norm: 80603.914 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5227/  159576 | consumed samples:       132768 | elapsed time per iteration (ms): 15499.4 | learning rate: 3.674E-05 | global batch size:    48 | lm loss: 6.304895E+00 | loss scale: 4096.0 | grad norm: 63916.075 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5228/  159576 | consumed samples:       132816 | elapsed time per iteration (ms): 15774.2 | learning rate: 3.675E-05 | global batch size:    48 | lm loss: 6.323613E+00 | loss scale: 4096.0 | grad norm: 76694.249 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5229/  159576 | consumed samples:       132864 | elapsed time per iteration (ms): 15599.1 | learning rate: 3.677E-05 | global batch size:    48 | lm loss: 6.488564E+00 | loss scale: 4096.0 | grad norm: 76280.931 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5230/  159576 | consumed samples:       132912 | elapsed time per iteration (ms): 15549.2 | learning rate: 3.678E-05 | global batch size:    48 | lm loss: 6.430355E+00 | loss scale: 4096.0 | grad norm: 71462.889 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5231/  159576 | consumed samples:       132960 | elapsed time per iteration (ms): 15501.3 | learning rate: 3.679E-05 | global batch size:    48 | lm loss: 6.493622E+00 | loss scale: 4096.0 | grad norm: 59853.872 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5232/  159576 | consumed samples:       133008 | elapsed time per iteration (ms): 15779.3 | learning rate: 3.681E-05 | global batch size:    48 | lm loss: 6.284019E+00 | loss scale: 4096.0 | grad norm: 69496.678 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5233/  159576 | consumed samples:       133056 | elapsed time per iteration (ms): 15428.5 | learning rate: 3.682E-05 | global batch size:    48 | lm loss: 6.267179E+00 | loss scale: 4096.0 | grad norm: 63245.018 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5234/  159576 | consumed samples:       133104 | elapsed time per iteration (ms): 15461.3 | learning rate: 3.683E-05 | global batch size:    48 | lm loss: 6.449612E+00 | loss scale: 4096.0 | grad norm: 78199.189 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5235/  159576 | consumed samples:       133152 | elapsed time per iteration (ms): 15485.3 | learning rate: 3.685E-05 | global batch size:    48 | lm loss: 6.443536E+00 | loss scale: 4096.0 | grad norm: 70168.271 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5236/  159576 | consumed samples:       133200 | elapsed time per iteration (ms): 15933.7 | learning rate: 3.686E-05 | global batch size:    48 | lm loss: 6.244983E+00 | loss scale: 4096.0 | grad norm: 75166.513 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5237/  159576 | consumed samples:       133248 | elapsed time per iteration (ms): 15418.0 | learning rate: 3.687E-05 | global batch size:    48 | lm loss: 6.283341E+00 | loss scale: 4096.0 | grad norm: 72463.714 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5238/  159576 | consumed samples:       133296 | elapsed time per iteration (ms): 15549.2 | learning rate: 3.689E-05 | global batch size:    48 | lm loss: 6.438685E+00 | loss scale: 4096.0 | grad norm: 82352.679 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5239/  159576 | consumed samples:       133344 | elapsed time per iteration (ms): 15537.2 | learning rate: 3.690E-05 | global batch size:    48 | lm loss: 6.362652E+00 | loss scale: 4096.0 | grad norm: 70918.803 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5240/  159576 | consumed samples:       133392 | elapsed time per iteration (ms): 15840.0 | learning rate: 3.691E-05 | global batch size:    48 | lm loss: 6.368175E+00 | loss scale: 4096.0 | grad norm: 155104.639 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5241/  159576 | consumed samples:       133440 | elapsed time per iteration (ms): 15490.2 | learning rate: 3.693E-05 | global batch size:    48 | lm loss: 6.400668E+00 | loss scale: 4096.0 | grad norm: 68076.314 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5242/  159576 | consumed samples:       133488 | elapsed time per iteration (ms): 15382.4 | learning rate: 3.694E-05 | global batch size:    48 | lm loss: 6.316941E+00 | loss scale: 4096.0 | grad norm: 57901.587 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5243/  159576 | consumed samples:       133536 | elapsed time per iteration (ms): 15382.2 | learning rate: 3.695E-05 | global batch size:    48 | lm loss: 6.494829E+00 | loss scale: 4096.0 | grad norm: 62287.898 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5244/  159576 | consumed samples:       133584 | elapsed time per iteration (ms): 15661.6 | learning rate: 3.697E-05 | global batch size:    48 | lm loss: 6.397869E+00 | loss scale: 4096.0 | grad norm: 57367.212 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5245/  159576 | consumed samples:       133632 | elapsed time per iteration (ms): 15495.8 | learning rate: 3.698E-05 | global batch size:    48 | lm loss: 6.256347E+00 | loss scale: 4096.0 | grad norm: 61800.740 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5246/  159576 | consumed samples:       133680 | elapsed time per iteration (ms): 15523.0 | learning rate: 3.699E-05 | global batch size:    48 | lm loss: 6.389894E+00 | loss scale: 4096.0 | grad norm: 69126.659 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5247/  159576 | consumed samples:       133728 | elapsed time per iteration (ms): 15546.9 | learning rate: 3.701E-05 | global batch size:    48 | lm loss: 6.346736E+00 | loss scale: 4096.0 | grad norm: 67046.324 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5248/  159576 | consumed samples:       133776 | elapsed time per iteration (ms): 15650.8 | learning rate: 3.702E-05 | global batch size:    48 | lm loss: 6.430111E+00 | loss scale: 4096.0 | grad norm: 69265.741 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5249/  159576 | consumed samples:       133824 | elapsed time per iteration (ms): 15490.3 | learning rate: 3.703E-05 | global batch size:    48 | lm loss: 6.390760E+00 | loss scale: 4096.0 | grad norm: 142507.174 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5250/  159576 | consumed samples:       133872 | elapsed time per iteration (ms): 15521.8 | learning rate: 3.705E-05 | global batch size:    48 | lm loss: 6.420756E+00 | loss scale: 4096.0 | grad norm: 64815.993 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5251/  159576 | consumed samples:       133920 | elapsed time per iteration (ms): 15759.8 | learning rate: 3.706E-05 | global batch size:    48 | lm loss: 6.360211E+00 | loss scale: 4096.0 | grad norm: 82700.295 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5252/  159576 | consumed samples:       133968 | elapsed time per iteration (ms): 15623.8 | learning rate: 3.707E-05 | global batch size:    48 | lm loss: 6.351006E+00 | loss scale: 4096.0 | grad norm: 70030.998 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-24 23:07:37] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1165978_[1-10%1] on 'gpu_p13' partition)
[2021-09-24 23:07:37] PULSE: tr8-104B is running for 17:15:26 since 2021-09-24T05:52:11 (1162855_1 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8])
 iteration     5253/  159576 | consumed samples:       134016 | elapsed time per iteration (ms): 15553.2 | learning rate: 3.709E-05 | global batch size:    48 | lm loss: 6.395989E+00 | loss scale: 4096.0 | grad norm: 75934.711 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5254/  159576 | consumed samples:       134064 | elapsed time per iteration (ms): 15521.6 | learning rate: 3.710E-05 | global batch size:    48 | lm loss: 6.388237E+00 | loss scale: 4096.0 | grad norm: 85225.047 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5255/  159576 | consumed samples:       134112 | elapsed time per iteration (ms): 15886.3 | learning rate: 3.711E-05 | global batch size:    48 | lm loss: 6.348703E+00 | loss scale: 4096.0 | grad norm: 72802.836 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5256/  159576 | consumed samples:       134160 | elapsed time per iteration (ms): 15520.3 | learning rate: 3.713E-05 | global batch size:    48 | lm loss: 6.321572E+00 | loss scale: 4096.0 | grad norm: 73245.874 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5257/  159576 | consumed samples:       134208 | elapsed time per iteration (ms): 15443.7 | learning rate: 3.714E-05 | global batch size:    48 | lm loss: 6.335665E+00 | loss scale: 4096.0 | grad norm: 58798.760 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5258/  159576 | consumed samples:       134256 | elapsed time per iteration (ms): 15427.0 | learning rate: 3.715E-05 | global batch size:    48 | lm loss: 6.319070E+00 | loss scale: 4096.0 | grad norm: 66591.391 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5259/  159576 | consumed samples:       134304 | elapsed time per iteration (ms): 15760.6 | learning rate: 3.717E-05 | global batch size:    48 | lm loss: 6.229961E+00 | loss scale: 4096.0 | grad norm: 78411.623 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5260/  159576 | consumed samples:       134352 | elapsed time per iteration (ms): 15544.0 | learning rate: 3.718E-05 | global batch size:    48 | lm loss: 6.379896E+00 | loss scale: 4096.0 | grad norm: 82294.960 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5261/  159576 | consumed samples:       134400 | elapsed time per iteration (ms): 15397.8 | learning rate: 3.719E-05 | global batch size:    48 | lm loss: 6.233184E+00 | loss scale: 4096.0 | grad norm: 65525.586 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5262/  159576 | consumed samples:       134448 | elapsed time per iteration (ms): 15498.3 | learning rate: 3.721E-05 | global batch size:    48 | lm loss: 6.326461E+00 | loss scale: 4096.0 | grad norm: 101232.286 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5263/  159576 | consumed samples:       134496 | elapsed time per iteration (ms): 15834.8 | learning rate: 3.722E-05 | global batch size:    48 | lm loss: 6.351873E+00 | loss scale: 4096.0 | grad norm: 82652.498 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5264/  159576 | consumed samples:       134544 | elapsed time per iteration (ms): 15450.4 | learning rate: 3.723E-05 | global batch size:    48 | lm loss: 6.411518E+00 | loss scale: 4096.0 | grad norm: 79704.233 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5265/  159576 | consumed samples:       134592 | elapsed time per iteration (ms): 15408.5 | learning rate: 3.725E-05 | global batch size:    48 | lm loss: 6.324855E+00 | loss scale: 4096.0 | grad norm: 96783.723 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5266/  159576 | consumed samples:       134640 | elapsed time per iteration (ms): 15369.4 | learning rate: 3.726E-05 | global batch size:    48 | lm loss: 6.351592E+00 | loss scale: 4096.0 | grad norm: 96231.447 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5267/  159576 | consumed samples:       134688 | elapsed time per iteration (ms): 15643.8 | learning rate: 3.727E-05 | global batch size:    48 | lm loss: 6.439371E+00 | loss scale: 4096.0 | grad norm: 86165.942 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5268/  159576 | consumed samples:       134736 | elapsed time per iteration (ms): 15428.0 | learning rate: 3.729E-05 | global batch size:    48 | lm loss: 6.282881E+00 | loss scale: 4096.0 | grad norm: 95370.085 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5269/  159576 | consumed samples:       134784 | elapsed time per iteration (ms): 15422.7 | learning rate: 3.730E-05 | global batch size:    48 | lm loss: 6.489480E+00 | loss scale: 4096.0 | grad norm: 77407.640 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5270/  159576 | consumed samples:       134832 | elapsed time per iteration (ms): 15384.0 | learning rate: 3.731E-05 | global batch size:    48 | lm loss: 6.382200E+00 | loss scale: 4096.0 | grad norm: 66716.315 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5271/  159576 | consumed samples:       134880 | elapsed time per iteration (ms): 15581.8 | learning rate: 3.733E-05 | global batch size:    48 | lm loss: 6.409722E+00 | loss scale: 4096.0 | grad norm: 68218.526 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5272/  159576 | consumed samples:       134928 | elapsed time per iteration (ms): 15395.7 | learning rate: 3.734E-05 | global batch size:    48 | lm loss: 6.493249E+00 | loss scale: 4096.0 | grad norm: 71580.496 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5273/  159576 | consumed samples:       134976 | elapsed time per iteration (ms): 15402.4 | learning rate: 3.735E-05 | global batch size:    48 | lm loss: 6.376624E+00 | loss scale: 4096.0 | grad norm: 85075.910 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5274/  159576 | consumed samples:       135024 | elapsed time per iteration (ms): 15424.2 | learning rate: 3.737E-05 | global batch size:    48 | lm loss: 6.441435E+00 | loss scale: 4096.0 | grad norm: 75286.225 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5275/  159576 | consumed samples:       135072 | elapsed time per iteration (ms): 15616.5 | learning rate: 3.738E-05 | global batch size:    48 | lm loss: 6.428281E+00 | loss scale: 4096.0 | grad norm: 71317.497 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5276/  159576 | consumed samples:       135120 | elapsed time per iteration (ms): 15383.8 | learning rate: 3.739E-05 | global batch size:    48 | lm loss: 6.324539E+00 | loss scale: 4096.0 | grad norm: 70509.208 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5277/  159576 | consumed samples:       135168 | elapsed time per iteration (ms): 15404.4 | learning rate: 3.741E-05 | global batch size:    48 | lm loss: 6.396560E+00 | loss scale: 4096.0 | grad norm: 68223.773 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5278/  159576 | consumed samples:       135216 | elapsed time per iteration (ms): 15464.0 | learning rate: 3.742E-05 | global batch size:    48 | lm loss: 6.403405E+00 | loss scale: 4096.0 | grad norm: 74828.040 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5279/  159576 | consumed samples:       135264 | elapsed time per iteration (ms): 15572.0 | learning rate: 3.743E-05 | global batch size:    48 | lm loss: 6.340907E+00 | loss scale: 4096.0 | grad norm: 103719.466 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5280/  159576 | consumed samples:       135312 | elapsed time per iteration (ms): 15390.1 | learning rate: 3.745E-05 | global batch size:    48 | lm loss: 6.465801E+00 | loss scale: 4096.0 | grad norm: 71954.053 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5281/  159576 | consumed samples:       135360 | elapsed time per iteration (ms): 15379.3 | learning rate: 3.746E-05 | global batch size:    48 | lm loss: 6.481463E+00 | loss scale: 4096.0 | grad norm: 64156.580 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5282/  159576 | consumed samples:       135408 | elapsed time per iteration (ms): 15880.0 | learning rate: 3.747E-05 | global batch size:    48 | lm loss: 6.324627E+00 | loss scale: 4096.0 | grad norm: 77974.806 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5283/  159576 | consumed samples:       135456 | elapsed time per iteration (ms): 15461.2 | learning rate: 3.749E-05 | global batch size:    48 | lm loss: 6.278036E+00 | loss scale: 4096.0 | grad norm: 78417.449 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5284/  159576 | consumed samples:       135504 | elapsed time per iteration (ms): 15434.3 | learning rate: 3.750E-05 | global batch size:    48 | lm loss: 6.470399E+00 | loss scale: 4096.0 | grad norm: 70677.576 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5285/  159576 | consumed samples:       135552 | elapsed time per iteration (ms): 15453.3 | learning rate: 3.751E-05 | global batch size:    48 | lm loss: 6.465354E+00 | loss scale: 4096.0 | grad norm: 72699.042 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5286/  159576 | consumed samples:       135600 | elapsed time per iteration (ms): 15799.4 | learning rate: 3.753E-05 | global batch size:    48 | lm loss: 6.366466E+00 | loss scale: 4096.0 | grad norm: 87890.137 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5287/  159576 | consumed samples:       135648 | elapsed time per iteration (ms): 15462.6 | learning rate: 3.754E-05 | global batch size:    48 | lm loss: 6.450302E+00 | loss scale: 4096.0 | grad norm: 65500.276 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5288/  159576 | consumed samples:       135696 | elapsed time per iteration (ms): 15449.3 | learning rate: 3.755E-05 | global batch size:    48 | lm loss: 6.211058E+00 | loss scale: 4096.0 | grad norm: 91309.432 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5289/  159576 | consumed samples:       135744 | elapsed time per iteration (ms): 15440.0 | learning rate: 3.757E-05 | global batch size:    48 | lm loss: 6.439297E+00 | loss scale: 4096.0 | grad norm: 78139.415 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5290/  159576 | consumed samples:       135792 | elapsed time per iteration (ms): 15759.6 | learning rate: 3.758E-05 | global batch size:    48 | lm loss: 6.295393E+00 | loss scale: 4096.0 | grad norm: 67343.216 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5291/  159576 | consumed samples:       135840 | elapsed time per iteration (ms): 15513.6 | learning rate: 3.759E-05 | global batch size:    48 | lm loss: 6.403075E+00 | loss scale: 4096.0 | grad norm: 88227.795 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5292/  159576 | consumed samples:       135888 | elapsed time per iteration (ms): 15421.3 | learning rate: 3.761E-05 | global batch size:    48 | lm loss: 6.414333E+00 | loss scale: 4096.0 | grad norm: 78788.254 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5293/  159576 | consumed samples:       135936 | elapsed time per iteration (ms): 15345.3 | learning rate: 3.762E-05 | global batch size:    48 | lm loss: 6.292488E+00 | loss scale: 4096.0 | grad norm: 59708.880 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5294/  159576 | consumed samples:       135984 | elapsed time per iteration (ms): 16027.7 | learning rate: 3.763E-05 | global batch size:    48 | lm loss: 6.385753E+00 | loss scale: 4096.0 | grad norm: 102775.204 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5295/  159576 | consumed samples:       136032 | elapsed time per iteration (ms): 15461.5 | learning rate: 3.765E-05 | global batch size:    48 | lm loss: 6.324437E+00 | loss scale: 4096.0 | grad norm: 71697.534 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5296/  159576 | consumed samples:       136080 | elapsed time per iteration (ms): 15433.9 | learning rate: 3.766E-05 | global batch size:    48 | lm loss: 6.384956E+00 | loss scale: 4096.0 | grad norm: 102953.672 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5297/  159576 | consumed samples:       136128 | elapsed time per iteration (ms): 15429.7 | learning rate: 3.767E-05 | global batch size:    48 | lm loss: 6.436825E+00 | loss scale: 4096.0 | grad norm: 75031.086 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5298/  159576 | consumed samples:       136176 | elapsed time per iteration (ms): 15818.4 | learning rate: 3.769E-05 | global batch size:    48 | lm loss: 6.482272E+00 | loss scale: 4096.0 | grad norm: 65276.986 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5299/  159576 | consumed samples:       136224 | elapsed time per iteration (ms): 15441.5 | learning rate: 3.770E-05 | global batch size:    48 | lm loss: 6.589076E+00 | loss scale: 4096.0 | grad norm: 121561.959 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5300/  159576 | consumed samples:       136272 | elapsed time per iteration (ms): 15422.2 | learning rate: 3.771E-05 | global batch size:    48 | lm loss: 6.405668E+00 | loss scale: 4096.0 | grad norm: 62093.972 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5301/  159576 | consumed samples:       136320 | elapsed time per iteration (ms): 15355.0 | learning rate: 3.773E-05 | global batch size:    48 | lm loss: 6.390646E+00 | loss scale: 4096.0 | grad norm: 56038.998 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5302/  159576 | consumed samples:       136368 | elapsed time per iteration (ms): 15565.3 | learning rate: 3.774E-05 | global batch size:    48 | lm loss: 6.410752E+00 | loss scale: 4096.0 | grad norm: 64581.105 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5303/  159576 | consumed samples:       136416 | elapsed time per iteration (ms): 15422.3 | learning rate: 3.775E-05 | global batch size:    48 | lm loss: 6.448494E+00 | loss scale: 4096.0 | grad norm: 77740.769 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5304/  159576 | consumed samples:       136464 | elapsed time per iteration (ms): 15454.6 | learning rate: 3.777E-05 | global batch size:    48 | lm loss: 6.436998E+00 | loss scale: 4096.0 | grad norm: 86587.477 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5305/  159576 | consumed samples:       136512 | elapsed time per iteration (ms): 15410.7 | learning rate: 3.778E-05 | global batch size:    48 | lm loss: 6.360906E+00 | loss scale: 4096.0 | grad norm: 102483.307 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5306/  159576 | consumed samples:       136560 | elapsed time per iteration (ms): 15590.5 | learning rate: 3.779E-05 | global batch size:    48 | lm loss: 6.449046E+00 | loss scale: 4096.0 | grad norm: 63898.529 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5307/  159576 | consumed samples:       136608 | elapsed time per iteration (ms): 15506.8 | learning rate: 3.781E-05 | global batch size:    48 | lm loss: 6.467348E+00 | loss scale: 4096.0 | grad norm: 66863.281 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5308/  159576 | consumed samples:       136656 | elapsed time per iteration (ms): 15351.0 | learning rate: 3.782E-05 | global batch size:    48 | lm loss: 6.301440E+00 | loss scale: 4096.0 | grad norm: 66038.590 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5309/  159576 | consumed samples:       136704 | elapsed time per iteration (ms): 15547.1 | learning rate: 3.783E-05 | global batch size:    48 | lm loss: 6.314401E+00 | loss scale: 4096.0 | grad norm: 100622.046 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5310/  159576 | consumed samples:       136752 | elapsed time per iteration (ms): 15714.1 | learning rate: 3.785E-05 | global batch size:    48 | lm loss: 6.474138E+00 | loss scale: 4096.0 | grad norm: 100713.919 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5311/  159576 | consumed samples:       136800 | elapsed time per iteration (ms): 15441.4 | learning rate: 3.786E-05 | global batch size:    48 | lm loss: 6.429978E+00 | loss scale: 4096.0 | grad norm: 73118.420 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5312/  159576 | consumed samples:       136848 | elapsed time per iteration (ms): 15448.2 | learning rate: 3.787E-05 | global batch size:    48 | lm loss: 6.322928E+00 | loss scale: 4096.0 | grad norm: 79244.189 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5313/  159576 | consumed samples:       136896 | elapsed time per iteration (ms): 15801.3 | learning rate: 3.789E-05 | global batch size:    48 | lm loss: 6.536728E+00 | loss scale: 4096.0 | grad norm: 80004.821 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5314/  159576 | consumed samples:       136944 | elapsed time per iteration (ms): 15420.7 | learning rate: 3.790E-05 | global batch size:    48 | lm loss: 6.358313E+00 | loss scale: 4096.0 | grad norm: 73656.992 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5315/  159576 | consumed samples:       136992 | elapsed time per iteration (ms): 15430.5 | learning rate: 3.791E-05 | global batch size:    48 | lm loss: 6.285139E+00 | loss scale: 4096.0 | grad norm: 72555.490 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5316/  159576 | consumed samples:       137040 | elapsed time per iteration (ms): 15418.3 | learning rate: 3.793E-05 | global batch size:    48 | lm loss: 6.355993E+00 | loss scale: 4096.0 | grad norm: 89604.868 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5317/  159576 | consumed samples:       137088 | elapsed time per iteration (ms): 15767.6 | learning rate: 3.794E-05 | global batch size:    48 | lm loss: 6.370296E+00 | loss scale: 4096.0 | grad norm: 68760.061 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5318/  159576 | consumed samples:       137136 | elapsed time per iteration (ms): 15469.0 | learning rate: 3.795E-05 | global batch size:    48 | lm loss: 6.401207E+00 | loss scale: 4096.0 | grad norm: 64825.425 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5319/  159576 | consumed samples:       137184 | elapsed time per iteration (ms): 15469.4 | learning rate: 3.797E-05 | global batch size:    48 | lm loss: 6.433188E+00 | loss scale: 4096.0 | grad norm: 75954.384 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5320/  159576 | consumed samples:       137232 | elapsed time per iteration (ms): 15484.0 | learning rate: 3.798E-05 | global batch size:    48 | lm loss: 6.422481E+00 | loss scale: 4096.0 | grad norm: 85143.261 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5321/  159576 | consumed samples:       137280 | elapsed time per iteration (ms): 15773.2 | learning rate: 3.799E-05 | global batch size:    48 | lm loss: 6.394318E+00 | loss scale: 4096.0 | grad norm: 81431.726 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5322/  159576 | consumed samples:       137328 | elapsed time per iteration (ms): 15339.5 | learning rate: 3.801E-05 | global batch size:    48 | lm loss: 6.498918E+00 | loss scale: 4096.0 | grad norm: 76418.870 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5323/  159576 | consumed samples:       137376 | elapsed time per iteration (ms): 15420.7 | learning rate: 3.802E-05 | global batch size:    48 | lm loss: 6.518599E+00 | loss scale: 4096.0 | grad norm: 71705.255 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5324/  159576 | consumed samples:       137424 | elapsed time per iteration (ms): 15420.3 | learning rate: 3.803E-05 | global batch size:    48 | lm loss: 6.429631E+00 | loss scale: 4096.0 | grad norm: 57358.188 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5325/  159576 | consumed samples:       137472 | elapsed time per iteration (ms): 15903.1 | learning rate: 3.805E-05 | global batch size:    48 | lm loss: 6.407781E+00 | loss scale: 4096.0 | grad norm: 91506.505 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5326/  159576 | consumed samples:       137520 | elapsed time per iteration (ms): 15425.4 | learning rate: 3.806E-05 | global batch size:    48 | lm loss: 6.399868E+00 | loss scale: 4096.0 | grad norm: 68843.352 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5327/  159576 | consumed samples:       137568 | elapsed time per iteration (ms): 15444.3 | learning rate: 3.807E-05 | global batch size:    48 | lm loss: 6.412372E+00 | loss scale: 4096.0 | grad norm: 67149.711 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5328/  159576 | consumed samples:       137616 | elapsed time per iteration (ms): 15406.6 | learning rate: 3.809E-05 | global batch size:    48 | lm loss: 6.430699E+00 | loss scale: 4096.0 | grad norm: 102742.719 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5329/  159576 | consumed samples:       137664 | elapsed time per iteration (ms): 15722.7 | learning rate: 3.810E-05 | global batch size:    48 | lm loss: 6.415520E+00 | loss scale: 4096.0 | grad norm: 73301.472 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5330/  159576 | consumed samples:       137712 | elapsed time per iteration (ms): 15405.0 | learning rate: 3.811E-05 | global batch size:    48 | lm loss: 6.359590E+00 | loss scale: 4096.0 | grad norm: 70222.523 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5331/  159576 | consumed samples:       137760 | elapsed time per iteration (ms): 15374.6 | learning rate: 3.813E-05 | global batch size:    48 | lm loss: 6.443409E+00 | loss scale: 4096.0 | grad norm: 79619.657 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5332/  159576 | consumed samples:       137808 | elapsed time per iteration (ms): 15404.3 | learning rate: 3.814E-05 | global batch size:    48 | lm loss: 6.412749E+00 | loss scale: 4096.0 | grad norm: 110889.514 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5333/  159576 | consumed samples:       137856 | elapsed time per iteration (ms): 15590.4 | learning rate: 3.815E-05 | global batch size:    48 | lm loss: 6.492513E+00 | loss scale: 4096.0 | grad norm: 80255.448 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5334/  159576 | consumed samples:       137904 | elapsed time per iteration (ms): 15436.5 | learning rate: 3.817E-05 | global batch size:    48 | lm loss: 6.400149E+00 | loss scale: 4096.0 | grad norm: 69554.344 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5335/  159576 | consumed samples:       137952 | elapsed time per iteration (ms): 15422.0 | learning rate: 3.818E-05 | global batch size:    48 | lm loss: 6.473186E+00 | loss scale: 4096.0 | grad norm: 96185.543 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5336/  159576 | consumed samples:       138000 | elapsed time per iteration (ms): 15442.7 | learning rate: 3.819E-05 | global batch size:    48 | lm loss: 6.552884E+00 | loss scale: 4096.0 | grad norm: 73254.921 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5337/  159576 | consumed samples:       138048 | elapsed time per iteration (ms): 15634.6 | learning rate: 3.821E-05 | global batch size:    48 | lm loss: 6.365612E+00 | loss scale: 4096.0 | grad norm: 57539.381 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5338/  159576 | consumed samples:       138096 | elapsed time per iteration (ms): 15386.8 | learning rate: 3.822E-05 | global batch size:    48 | lm loss: 6.445109E+00 | loss scale: 4096.0 | grad norm: 67382.289 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5339/  159576 | consumed samples:       138144 | elapsed time per iteration (ms): 15470.1 | learning rate: 3.823E-05 | global batch size:    48 | lm loss: 6.353713E+00 | loss scale: 4096.0 | grad norm: 110272.660 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5340/  159576 | consumed samples:       138192 | elapsed time per iteration (ms): 15791.0 | learning rate: 3.825E-05 | global batch size:    48 | lm loss: 6.413539E+00 | loss scale: 4096.0 | grad norm: 72349.998 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5341/  159576 | consumed samples:       138240 | elapsed time per iteration (ms): 15411.4 | learning rate: 3.826E-05 | global batch size:    48 | lm loss: 6.347322E+00 | loss scale: 4096.0 | grad norm: 61859.125 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5342/  159576 | consumed samples:       138288 | elapsed time per iteration (ms): 15471.9 | learning rate: 3.827E-05 | global batch size:    48 | lm loss: 6.298682E+00 | loss scale: 4096.0 | grad norm: 78125.812 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5343/  159576 | consumed samples:       138336 | elapsed time per iteration (ms): 15450.5 | learning rate: 3.829E-05 | global batch size:    48 | lm loss: 6.346509E+00 | loss scale: 4096.0 | grad norm: 76921.340 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5344/  159576 | consumed samples:       138384 | elapsed time per iteration (ms): 15797.4 | learning rate: 3.830E-05 | global batch size:    48 | lm loss: 6.464560E+00 | loss scale: 4096.0 | grad norm: 73833.261 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5345/  159576 | consumed samples:       138432 | elapsed time per iteration (ms): 15447.3 | learning rate: 3.831E-05 | global batch size:    48 | lm loss: 6.491942E+00 | loss scale: 4096.0 | grad norm: 58609.094 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5346/  159576 | consumed samples:       138480 | elapsed time per iteration (ms): 15470.6 | learning rate: 3.833E-05 | global batch size:    48 | lm loss: 6.408776E+00 | loss scale: 4096.0 | grad norm: 61084.726 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5347/  159576 | consumed samples:       138528 | elapsed time per iteration (ms): 15595.7 | learning rate: 3.834E-05 | global batch size:    48 | lm loss: 6.317072E+00 | loss scale: 4096.0 | grad norm: 79107.564 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5348/  159576 | consumed samples:       138576 | elapsed time per iteration (ms): 15857.5 | learning rate: 3.835E-05 | global batch size:    48 | lm loss: 6.342214E+00 | loss scale: 4096.0 | grad norm: 82396.508 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5349/  159576 | consumed samples:       138624 | elapsed time per iteration (ms): 15501.3 | learning rate: 3.837E-05 | global batch size:    48 | lm loss: 6.416060E+00 | loss scale: 4096.0 | grad norm: 58909.391 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5350/  159576 | consumed samples:       138672 | elapsed time per iteration (ms): 15334.9 | learning rate: 3.838E-05 | global batch size:    48 | lm loss: 6.348287E+00 | loss scale: 4096.0 | grad norm: 54069.980 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5351/  159576 | consumed samples:       138720 | elapsed time per iteration (ms): 15454.2 | learning rate: 3.839E-05 | global batch size:    48 | lm loss: 6.456007E+00 | loss scale: 4096.0 | grad norm: 61307.306 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5352/  159576 | consumed samples:       138768 | elapsed time per iteration (ms): 15972.1 | learning rate: 3.841E-05 | global batch size:    48 | lm loss: 6.276731E+00 | loss scale: 4096.0 | grad norm: 62789.049 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5353/  159576 | consumed samples:       138816 | elapsed time per iteration (ms): 15447.0 | learning rate: 3.842E-05 | global batch size:    48 | lm loss: 6.443192E+00 | loss scale: 4096.0 | grad norm: 75454.112 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5354/  159576 | consumed samples:       138864 | elapsed time per iteration (ms): 15426.1 | learning rate: 3.843E-05 | global batch size:    48 | lm loss: 6.301665E+00 | loss scale: 4096.0 | grad norm: 66381.021 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5355/  159576 | consumed samples:       138912 | elapsed time per iteration (ms): 15465.4 | learning rate: 3.845E-05 | global batch size:    48 | lm loss: 6.453572E+00 | loss scale: 4096.0 | grad norm: 63236.178 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5356/  159576 | consumed samples:       138960 | elapsed time per iteration (ms): 15595.7 | learning rate: 3.846E-05 | global batch size:    48 | lm loss: 6.391494E+00 | loss scale: 4096.0 | grad norm: 78457.049 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5357/  159576 | consumed samples:       139008 | elapsed time per iteration (ms): 15508.4 | learning rate: 3.847E-05 | global batch size:    48 | lm loss: 6.379974E+00 | loss scale: 4096.0 | grad norm: 85282.485 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5358/  159576 | consumed samples:       139056 | elapsed time per iteration (ms): 15495.7 | learning rate: 3.849E-05 | global batch size:    48 | lm loss: 6.517261E+00 | loss scale: 4096.0 | grad norm: 75329.391 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5359/  159576 | consumed samples:       139104 | elapsed time per iteration (ms): 15455.1 | learning rate: 3.850E-05 | global batch size:    48 | lm loss: 6.311386E+00 | loss scale: 4096.0 | grad norm: 74599.792 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5360/  159576 | consumed samples:       139152 | elapsed time per iteration (ms): 15693.4 | learning rate: 3.851E-05 | global batch size:    48 | lm loss: 6.481428E+00 | loss scale: 4096.0 | grad norm: 77215.648 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5361/  159576 | consumed samples:       139200 | elapsed time per iteration (ms): 15475.6 | learning rate: 3.853E-05 | global batch size:    48 | lm loss: 6.331719E+00 | loss scale: 4096.0 | grad norm: 60279.803 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5362/  159576 | consumed samples:       139248 | elapsed time per iteration (ms): 15551.6 | learning rate: 3.854E-05 | global batch size:    48 | lm loss: 6.506707E+00 | loss scale: 4096.0 | grad norm: 57442.387 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5363/  159576 | consumed samples:       139296 | elapsed time per iteration (ms): 15525.0 | learning rate: 3.855E-05 | global batch size:    48 | lm loss: 6.283090E+00 | loss scale: 4096.0 | grad norm: 69167.961 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5364/  159576 | consumed samples:       139344 | elapsed time per iteration (ms): 15703.9 | learning rate: 3.857E-05 | global batch size:    48 | lm loss: 6.344968E+00 | loss scale: 4096.0 | grad norm: 66351.451 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5365/  159576 | consumed samples:       139392 | elapsed time per iteration (ms): 15511.9 | learning rate: 3.858E-05 | global batch size:    48 | lm loss: 6.402239E+00 | loss scale: 4096.0 | grad norm: 69893.747 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5366/  159576 | consumed samples:       139440 | elapsed time per iteration (ms): 15507.6 | learning rate: 3.859E-05 | global batch size:    48 | lm loss: 6.510591E+00 | loss scale: 4096.0 | grad norm: 73294.922 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5367/  159576 | consumed samples:       139488 | elapsed time per iteration (ms): 15841.0 | learning rate: 3.861E-05 | global batch size:    48 | lm loss: 6.292207E+00 | loss scale: 4096.0 | grad norm: 69220.189 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5368/  159576 | consumed samples:       139536 | elapsed time per iteration (ms): 15748.2 | learning rate: 3.862E-05 | global batch size:    48 | lm loss: 6.492587E+00 | loss scale: 4096.0 | grad norm: 78294.485 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5369/  159576 | consumed samples:       139584 | elapsed time per iteration (ms): 15492.3 | learning rate: 3.863E-05 | global batch size:    48 | lm loss: 6.493845E+00 | loss scale: 4096.0 | grad norm: 94517.294 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5370/  159576 | consumed samples:       139632 | elapsed time per iteration (ms): 15493.8 | learning rate: 3.864E-05 | global batch size:    48 | lm loss: 6.430061E+00 | loss scale: 4096.0 | grad norm: 77523.471 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5371/  159576 | consumed samples:       139680 | elapsed time per iteration (ms): 15870.2 | learning rate: 3.866E-05 | global batch size:    48 | lm loss: 6.411311E+00 | loss scale: 4096.0 | grad norm: 69582.630 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5372/  159576 | consumed samples:       139728 | elapsed time per iteration (ms): 15517.9 | learning rate: 3.867E-05 | global batch size:    48 | lm loss: 6.515477E+00 | loss scale: 4096.0 | grad norm: 75626.793 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5373/  159576 | consumed samples:       139776 | elapsed time per iteration (ms): 15491.8 | learning rate: 3.868E-05 | global batch size:    48 | lm loss: 6.453342E+00 | loss scale: 4096.0 | grad norm: 69940.821 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5374/  159576 | consumed samples:       139824 | elapsed time per iteration (ms): 15511.6 | learning rate: 3.870E-05 | global batch size:    48 | lm loss: 6.378087E+00 | loss scale: 4096.0 | grad norm: 70420.660 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5375/  159576 | consumed samples:       139872 | elapsed time per iteration (ms): 15836.7 | learning rate: 3.871E-05 | global batch size:    48 | lm loss: 6.371119E+00 | loss scale: 4096.0 | grad norm: 56046.647 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5376/  159576 | consumed samples:       139920 | elapsed time per iteration (ms): 15468.7 | learning rate: 3.872E-05 | global batch size:    48 | lm loss: 6.480386E+00 | loss scale: 4096.0 | grad norm: 67254.408 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5377/  159576 | consumed samples:       139968 | elapsed time per iteration (ms): 15505.8 | learning rate: 3.874E-05 | global batch size:    48 | lm loss: 6.445705E+00 | loss scale: 4096.0 | grad norm: 58120.342 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5378/  159576 | consumed samples:       140016 | elapsed time per iteration (ms): 15512.2 | learning rate: 3.875E-05 | global batch size:    48 | lm loss: 6.383876E+00 | loss scale: 4096.0 | grad norm: 63811.158 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5379/  159576 | consumed samples:       140064 | elapsed time per iteration (ms): 15885.3 | learning rate: 3.876E-05 | global batch size:    48 | lm loss: 6.430426E+00 | loss scale: 4096.0 | grad norm: 71627.105 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5380/  159576 | consumed samples:       140112 | elapsed time per iteration (ms): 15514.4 | learning rate: 3.878E-05 | global batch size:    48 | lm loss: 6.352599E+00 | loss scale: 4096.0 | grad norm: 55768.573 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5381/  159576 | consumed samples:       140160 | elapsed time per iteration (ms): 15536.5 | learning rate: 3.879E-05 | global batch size:    48 | lm loss: 6.462265E+00 | loss scale: 4096.0 | grad norm: 76307.339 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5382/  159576 | consumed samples:       140208 | elapsed time per iteration (ms): 15499.8 | learning rate: 3.880E-05 | global batch size:    48 | lm loss: 6.439154E+00 | loss scale: 4096.0 | grad norm: 97619.861 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5383/  159576 | consumed samples:       140256 | elapsed time per iteration (ms): 15693.9 | learning rate: 3.882E-05 | global batch size:    48 | lm loss: 6.327425E+00 | loss scale: 4096.0 | grad norm: 69803.211 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5384/  159576 | consumed samples:       140304 | elapsed time per iteration (ms): 15550.5 | learning rate: 3.883E-05 | global batch size:    48 | lm loss: 6.391693E+00 | loss scale: 4096.0 | grad norm: 66211.348 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5385/  159576 | consumed samples:       140352 | elapsed time per iteration (ms): 15520.0 | learning rate: 3.884E-05 | global batch size:    48 | lm loss: 6.323473E+00 | loss scale: 4096.0 | grad norm: 68034.810 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5386/  159576 | consumed samples:       140400 | elapsed time per iteration (ms): 15545.0 | learning rate: 3.886E-05 | global batch size:    48 | lm loss: 6.299393E+00 | loss scale: 4096.0 | grad norm: 85492.599 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5387/  159576 | consumed samples:       140448 | elapsed time per iteration (ms): 15684.9 | learning rate: 3.887E-05 | global batch size:    48 | lm loss: 6.374225E+00 | loss scale: 4096.0 | grad norm: 72949.757 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5388/  159576 | consumed samples:       140496 | elapsed time per iteration (ms): 15553.2 | learning rate: 3.888E-05 | global batch size:    48 | lm loss: 6.446224E+00 | loss scale: 4096.0 | grad norm: 83315.401 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5389/  159576 | consumed samples:       140544 | elapsed time per iteration (ms): 15520.1 | learning rate: 3.890E-05 | global batch size:    48 | lm loss: 6.336344E+00 | loss scale: 4096.0 | grad norm: 60566.619 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5390/  159576 | consumed samples:       140592 | elapsed time per iteration (ms): 15438.2 | learning rate: 3.891E-05 | global batch size:    48 | lm loss: 6.437949E+00 | loss scale: 4096.0 | grad norm: 93800.672 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5391/  159576 | consumed samples:       140640 | elapsed time per iteration (ms): 15842.4 | learning rate: 3.892E-05 | global batch size:    48 | lm loss: 6.445059E+00 | loss scale: 4096.0 | grad norm: 67207.362 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5392/  159576 | consumed samples:       140688 | elapsed time per iteration (ms): 15543.4 | learning rate: 3.894E-05 | global batch size:    48 | lm loss: 6.340952E+00 | loss scale: 4096.0 | grad norm: 92289.634 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5393/  159576 | consumed samples:       140736 | elapsed time per iteration (ms): 15518.9 | learning rate: 3.895E-05 | global batch size:    48 | lm loss: 6.416577E+00 | loss scale: 4096.0 | grad norm: 84099.384 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5394/  159576 | consumed samples:       140784 | elapsed time per iteration (ms): 15997.3 | learning rate: 3.896E-05 | global batch size:    48 | lm loss: 6.439622E+00 | loss scale: 4096.0 | grad norm: 54809.573 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5395/  159576 | consumed samples:       140832 | elapsed time per iteration (ms): 15450.3 | learning rate: 3.898E-05 | global batch size:    48 | lm loss: 6.441430E+00 | loss scale: 4096.0 | grad norm: 63144.662 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5396/  159576 | consumed samples:       140880 | elapsed time per iteration (ms): 15568.2 | learning rate: 3.899E-05 | global batch size:    48 | lm loss: 6.424047E+00 | loss scale: 4096.0 | grad norm: 106261.057 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5397/  159576 | consumed samples:       140928 | elapsed time per iteration (ms): 15464.4 | learning rate: 3.900E-05 | global batch size:    48 | lm loss: 6.325677E+00 | loss scale: 4096.0 | grad norm: 64383.277 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5398/  159576 | consumed samples:       140976 | elapsed time per iteration (ms): 15883.9 | learning rate: 3.902E-05 | global batch size:    48 | lm loss: 6.582463E+00 | loss scale: 4096.0 | grad norm: 66662.490 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5399/  159576 | consumed samples:       141024 | elapsed time per iteration (ms): 15497.5 | learning rate: 3.903E-05 | global batch size:    48 | lm loss: 6.498641E+00 | loss scale: 4096.0 | grad norm: 59391.511 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5400/  159576 | consumed samples:       141072 | elapsed time per iteration (ms): 15569.9 | learning rate: 3.904E-05 | global batch size:    48 | lm loss: 6.283938E+00 | loss scale: 4096.0 | grad norm: 64487.813 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5401/  159576 | consumed samples:       141120 | elapsed time per iteration (ms): 15526.8 | learning rate: 3.906E-05 | global batch size:    48 | lm loss: 6.336715E+00 | loss scale: 4096.0 | grad norm: 57781.336 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5402/  159576 | consumed samples:       141168 | elapsed time per iteration (ms): 15981.6 | learning rate: 3.907E-05 | global batch size:    48 | lm loss: 6.293415E+00 | loss scale: 4096.0 | grad norm: 92738.567 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5403/  159576 | consumed samples:       141216 | elapsed time per iteration (ms): 15632.0 | learning rate: 3.908E-05 | global batch size:    48 | lm loss: 6.294649E+00 | loss scale: 4096.0 | grad norm: 62910.047 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5404/  159576 | consumed samples:       141264 | elapsed time per iteration (ms): 15497.6 | learning rate: 3.910E-05 | global batch size:    48 | lm loss: 6.331801E+00 | loss scale: 4096.0 | grad norm: 64648.240 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5405/  159576 | consumed samples:       141312 | elapsed time per iteration (ms): 15498.1 | learning rate: 3.911E-05 | global batch size:    48 | lm loss: 6.406822E+00 | loss scale: 4096.0 | grad norm: 71416.233 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5406/  159576 | consumed samples:       141360 | elapsed time per iteration (ms): 15867.4 | learning rate: 3.912E-05 | global batch size:    48 | lm loss: 6.404875E+00 | loss scale: 4096.0 | grad norm: 56955.709 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5407/  159576 | consumed samples:       141408 | elapsed time per iteration (ms): 15506.2 | learning rate: 3.914E-05 | global batch size:    48 | lm loss: 6.428100E+00 | loss scale: 4096.0 | grad norm: 65410.012 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5408/  159576 | consumed samples:       141456 | elapsed time per iteration (ms): 15573.9 | learning rate: 3.915E-05 | global batch size:    48 | lm loss: 6.352518E+00 | loss scale: 4096.0 | grad norm: 57463.162 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5409/  159576 | consumed samples:       141504 | elapsed time per iteration (ms): 15570.8 | learning rate: 3.916E-05 | global batch size:    48 | lm loss: 6.276915E+00 | loss scale: 4096.0 | grad norm: 56808.465 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5410/  159576 | consumed samples:       141552 | elapsed time per iteration (ms): 15647.9 | learning rate: 3.918E-05 | global batch size:    48 | lm loss: 6.388402E+00 | loss scale: 4096.0 | grad norm: 55831.269 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5411/  159576 | consumed samples:       141600 | elapsed time per iteration (ms): 15527.8 | learning rate: 3.919E-05 | global batch size:    48 | lm loss: 6.359324E+00 | loss scale: 4096.0 | grad norm: 58176.863 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5412/  159576 | consumed samples:       141648 | elapsed time per iteration (ms): 15485.9 | learning rate: 3.920E-05 | global batch size:    48 | lm loss: 6.410316E+00 | loss scale: 4096.0 | grad norm: 58797.382 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5413/  159576 | consumed samples:       141696 | elapsed time per iteration (ms): 15570.6 | learning rate: 3.922E-05 | global batch size:    48 | lm loss: 6.487602E+00 | loss scale: 4096.0 | grad norm: 54779.384 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5414/  159576 | consumed samples:       141744 | elapsed time per iteration (ms): 15692.4 | learning rate: 3.923E-05 | global batch size:    48 | lm loss: 6.538764E+00 | loss scale: 4096.0 | grad norm: 56952.810 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5415/  159576 | consumed samples:       141808 | elapsed time per iteration (ms): 16423.4 | learning rate: 3.925E-05 | global batch size:    64 | lm loss: 6.468464E+00 | loss scale: 4096.0 | grad norm: 47962.953 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5416/  159576 | consumed samples:       141872 | elapsed time per iteration (ms): 16486.4 | learning rate: 3.927E-05 | global batch size:    64 | lm loss: 6.358836E+00 | loss scale: 4096.0 | grad norm: 79746.041 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5417/  159576 | consumed samples:       141936 | elapsed time per iteration (ms): 16837.9 | learning rate: 3.928E-05 | global batch size:    64 | lm loss: 6.458796E+00 | loss scale: 4096.0 | grad norm: 72485.233 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5418/  159576 | consumed samples:       142000 | elapsed time per iteration (ms): 16282.1 | learning rate: 3.930E-05 | global batch size:    64 | lm loss: 6.325031E+00 | loss scale: 4096.0 | grad norm: 50657.294 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5419/  159576 | consumed samples:       142064 | elapsed time per iteration (ms): 16473.5 | learning rate: 3.932E-05 | global batch size:    64 | lm loss: 6.393603E+00 | loss scale: 4096.0 | grad norm: 53317.124 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5420/  159576 | consumed samples:       142128 | elapsed time per iteration (ms): 16358.3 | learning rate: 3.934E-05 | global batch size:    64 | lm loss: 6.505975E+00 | loss scale: 4096.0 | grad norm: 76759.970 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5421/  159576 | consumed samples:       142192 | elapsed time per iteration (ms): 16646.9 | learning rate: 3.936E-05 | global batch size:    64 | lm loss: 6.377459E+00 | loss scale: 4096.0 | grad norm: 61658.865 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5422/  159576 | consumed samples:       142256 | elapsed time per iteration (ms): 16480.4 | learning rate: 3.937E-05 | global batch size:    64 | lm loss: 6.350579E+00 | loss scale: 4096.0 | grad norm: 61672.596 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5423/  159576 | consumed samples:       142320 | elapsed time per iteration (ms): 16500.8 | learning rate: 3.939E-05 | global batch size:    64 | lm loss: 6.359305E+00 | loss scale: 4096.0 | grad norm: 71934.386 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5424/  159576 | consumed samples:       142384 | elapsed time per iteration (ms): 16400.7 | learning rate: 3.941E-05 | global batch size:    64 | lm loss: 6.515474E+00 | loss scale: 4096.0 | grad norm: 62262.598 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5425/  159576 | consumed samples:       142448 | elapsed time per iteration (ms): 16686.7 | learning rate: 3.943E-05 | global batch size:    64 | lm loss: 6.377324E+00 | loss scale: 4096.0 | grad norm: 66128.264 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5426/  159576 | consumed samples:       142512 | elapsed time per iteration (ms): 16346.9 | learning rate: 3.944E-05 | global batch size:    64 | lm loss: 6.394655E+00 | loss scale: 4096.0 | grad norm: 64276.983 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5427/  159576 | consumed samples:       142576 | elapsed time per iteration (ms): 16454.0 | learning rate: 3.946E-05 | global batch size:    64 | lm loss: 6.417256E+00 | loss scale: 4096.0 | grad norm: 55916.762 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5428/  159576 | consumed samples:       142640 | elapsed time per iteration (ms): 16713.8 | learning rate: 3.948E-05 | global batch size:    64 | lm loss: 6.314127E+00 | loss scale: 4096.0 | grad norm: 65443.157 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5429/  159576 | consumed samples:       142704 | elapsed time per iteration (ms): 16492.7 | learning rate: 3.950E-05 | global batch size:    64 | lm loss: 6.349669E+00 | loss scale: 4096.0 | grad norm: 64819.083 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5430/  159576 | consumed samples:       142768 | elapsed time per iteration (ms): 16430.1 | learning rate: 3.951E-05 | global batch size:    64 | lm loss: 6.406096E+00 | loss scale: 4096.0 | grad norm: 72027.252 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5431/  159576 | consumed samples:       142832 | elapsed time per iteration (ms): 16452.9 | learning rate: 3.953E-05 | global batch size:    64 | lm loss: 6.422045E+00 | loss scale: 4096.0 | grad norm: 59470.191 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5432/  159576 | consumed samples:       142896 | elapsed time per iteration (ms): 16574.0 | learning rate: 3.955E-05 | global batch size:    64 | lm loss: 6.384964E+00 | loss scale: 4096.0 | grad norm: 59229.555 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5433/  159576 | consumed samples:       142960 | elapsed time per iteration (ms): 16448.4 | learning rate: 3.957E-05 | global batch size:    64 | lm loss: 6.388242E+00 | loss scale: 4096.0 | grad norm: 51139.017 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5434/  159576 | consumed samples:       143024 | elapsed time per iteration (ms): 16378.2 | learning rate: 3.959E-05 | global batch size:    64 | lm loss: 6.422913E+00 | loss scale: 4096.0 | grad norm: 55548.958 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5435/  159576 | consumed samples:       143088 | elapsed time per iteration (ms): 16838.8 | learning rate: 3.960E-05 | global batch size:    64 | lm loss: 6.399693E+00 | loss scale: 4096.0 | grad norm: 87728.143 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5436/  159576 | consumed samples:       143152 | elapsed time per iteration (ms): 16458.9 | learning rate: 3.962E-05 | global batch size:    64 | lm loss: 6.291359E+00 | loss scale: 4096.0 | grad norm: 65955.697 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5437/  159576 | consumed samples:       143216 | elapsed time per iteration (ms): 16425.2 | learning rate: 3.964E-05 | global batch size:    64 | lm loss: 6.367932E+00 | loss scale: 4096.0 | grad norm: 63150.328 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5438/  159576 | consumed samples:       143280 | elapsed time per iteration (ms): 16418.8 | learning rate: 3.966E-05 | global batch size:    64 | lm loss: 6.365756E+00 | loss scale: 4096.0 | grad norm: 57427.195 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5439/  159576 | consumed samples:       143344 | elapsed time per iteration (ms): 16802.3 | learning rate: 3.967E-05 | global batch size:    64 | lm loss: 6.415596E+00 | loss scale: 4096.0 | grad norm: 61605.287 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5440/  159576 | consumed samples:       143408 | elapsed time per iteration (ms): 16516.9 | learning rate: 3.969E-05 | global batch size:    64 | lm loss: 6.414165E+00 | loss scale: 4096.0 | grad norm: 64434.632 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5441/  159576 | consumed samples:       143472 | elapsed time per iteration (ms): 16398.0 | learning rate: 3.971E-05 | global batch size:    64 | lm loss: 6.425170E+00 | loss scale: 4096.0 | grad norm: 63830.236 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5442/  159576 | consumed samples:       143536 | elapsed time per iteration (ms): 16330.0 | learning rate: 3.973E-05 | global batch size:    64 | lm loss: 6.420317E+00 | loss scale: 4096.0 | grad norm: 80818.483 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5443/  159576 | consumed samples:       143600 | elapsed time per iteration (ms): 16646.2 | learning rate: 3.975E-05 | global batch size:    64 | lm loss: 6.404300E+00 | loss scale: 4096.0 | grad norm: 66058.957 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5444/  159576 | consumed samples:       143664 | elapsed time per iteration (ms): 16389.9 | learning rate: 3.976E-05 | global batch size:    64 | lm loss: 6.307170E+00 | loss scale: 4096.0 | grad norm: 64553.082 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5445/  159576 | consumed samples:       143728 | elapsed time per iteration (ms): 16425.8 | learning rate: 3.978E-05 | global batch size:    64 | lm loss: 6.474117E+00 | loss scale: 4096.0 | grad norm: 54414.389 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5446/  159576 | consumed samples:       143792 | elapsed time per iteration (ms): 16855.6 | learning rate: 3.980E-05 | global batch size:    64 | lm loss: 6.329272E+00 | loss scale: 4096.0 | grad norm: 67896.275 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5447/  159576 | consumed samples:       143856 | elapsed time per iteration (ms): 16363.1 | learning rate: 3.982E-05 | global batch size:    64 | lm loss: 6.485427E+00 | loss scale: 4096.0 | grad norm: 55200.098 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5448/  159576 | consumed samples:       143920 | elapsed time per iteration (ms): 16446.4 | learning rate: 3.983E-05 | global batch size:    64 | lm loss: 6.474103E+00 | loss scale: 4096.0 | grad norm: 58759.422 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5449/  159576 | consumed samples:       143984 | elapsed time per iteration (ms): 16365.5 | learning rate: 3.985E-05 | global batch size:    64 | lm loss: 6.386650E+00 | loss scale: 4096.0 | grad norm: 69075.558 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5450/  159576 | consumed samples:       144048 | elapsed time per iteration (ms): 16855.4 | learning rate: 3.987E-05 | global batch size:    64 | lm loss: 6.407839E+00 | loss scale: 4096.0 | grad norm: 76751.714 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5451/  159576 | consumed samples:       144112 | elapsed time per iteration (ms): 16481.2 | learning rate: 3.989E-05 | global batch size:    64 | lm loss: 6.437217E+00 | loss scale: 4096.0 | grad norm: 60762.834 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5452/  159576 | consumed samples:       144176 | elapsed time per iteration (ms): 16387.3 | learning rate: 3.991E-05 | global batch size:    64 | lm loss: 6.391966E+00 | loss scale: 4096.0 | grad norm: 57835.999 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5453/  159576 | consumed samples:       144240 | elapsed time per iteration (ms): 16456.9 | learning rate: 3.992E-05 | global batch size:    64 | lm loss: 6.407461E+00 | loss scale: 4096.0 | grad norm: 56276.948 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5454/  159576 | consumed samples:       144304 | elapsed time per iteration (ms): 16533.3 | learning rate: 3.994E-05 | global batch size:    64 | lm loss: 6.319425E+00 | loss scale: 4096.0 | grad norm: 66856.562 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5455/  159576 | consumed samples:       144368 | elapsed time per iteration (ms): 16417.1 | learning rate: 3.996E-05 | global batch size:    64 | lm loss: 6.377168E+00 | loss scale: 4096.0 | grad norm: 53863.935 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5456/  159576 | consumed samples:       144432 | elapsed time per iteration (ms): 16422.1 | learning rate: 3.998E-05 | global batch size:    64 | lm loss: 6.368913E+00 | loss scale: 4096.0 | grad norm: 63261.354 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5457/  159576 | consumed samples:       144496 | elapsed time per iteration (ms): 16738.2 | learning rate: 3.999E-05 | global batch size:    64 | lm loss: 6.264383E+00 | loss scale: 4096.0 | grad norm: 64656.043 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5458/  159576 | consumed samples:       144560 | elapsed time per iteration (ms): 16315.9 | learning rate: 4.001E-05 | global batch size:    64 | lm loss: 6.410008E+00 | loss scale: 4096.0 | grad norm: 82472.599 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5459/  159576 | consumed samples:       144624 | elapsed time per iteration (ms): 16385.7 | learning rate: 4.003E-05 | global batch size:    64 | lm loss: 6.419100E+00 | loss scale: 4096.0 | grad norm: 81581.674 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5460/  159576 | consumed samples:       144688 | elapsed time per iteration (ms): 16422.6 | learning rate: 4.005E-05 | global batch size:    64 | lm loss: 6.374327E+00 | loss scale: 4096.0 | grad norm: 77883.993 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5461/  159576 | consumed samples:       144752 | elapsed time per iteration (ms): 16514.0 | learning rate: 4.007E-05 | global batch size:    64 | lm loss: 6.323710E+00 | loss scale: 4096.0 | grad norm: 59535.385 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5462/  159576 | consumed samples:       144816 | elapsed time per iteration (ms): 16520.4 | learning rate: 4.008E-05 | global batch size:    64 | lm loss: 6.325150E+00 | loss scale: 4096.0 | grad norm: 54807.099 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5463/  159576 | consumed samples:       144880 | elapsed time per iteration (ms): 16362.9 | learning rate: 4.010E-05 | global batch size:    64 | lm loss: 6.461391E+00 | loss scale: 4096.0 | grad norm: 74839.084 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5464/  159576 | consumed samples:       144944 | elapsed time per iteration (ms): 16408.3 | learning rate: 4.012E-05 | global batch size:    64 | lm loss: 6.392217E+00 | loss scale: 4096.0 | grad norm: 61727.667 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5465/  159576 | consumed samples:       145008 | elapsed time per iteration (ms): 16556.8 | learning rate: 4.014E-05 | global batch size:    64 | lm loss: 6.349445E+00 | loss scale: 4096.0 | grad norm: 90938.249 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5466/  159576 | consumed samples:       145072 | elapsed time per iteration (ms): 16389.1 | learning rate: 4.015E-05 | global batch size:    64 | lm loss: 6.314983E+00 | loss scale: 4096.0 | grad norm: 62408.172 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5467/  159576 | consumed samples:       145136 | elapsed time per iteration (ms): 16364.1 | learning rate: 4.017E-05 | global batch size:    64 | lm loss: 6.412921E+00 | loss scale: 4096.0 | grad norm: 82535.193 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5468/  159576 | consumed samples:       145200 | elapsed time per iteration (ms): 16712.9 | learning rate: 4.019E-05 | global batch size:    64 | lm loss: 6.508467E+00 | loss scale: 4096.0 | grad norm: 53388.956 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5469/  159576 | consumed samples:       145264 | elapsed time per iteration (ms): 16357.7 | learning rate: 4.021E-05 | global batch size:    64 | lm loss: 6.367021E+00 | loss scale: 4096.0 | grad norm: 88053.691 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5470/  159576 | consumed samples:       145328 | elapsed time per iteration (ms): 16424.7 | learning rate: 4.022E-05 | global batch size:    64 | lm loss: 6.396588E+00 | loss scale: 4096.0 | grad norm: 83281.076 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5471/  159576 | consumed samples:       145392 | elapsed time per iteration (ms): 16363.6 | learning rate: 4.024E-05 | global batch size:    64 | lm loss: 6.387273E+00 | loss scale: 4096.0 | grad norm: 56875.433 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5472/  159576 | consumed samples:       145456 | elapsed time per iteration (ms): 16523.2 | learning rate: 4.026E-05 | global batch size:    64 | lm loss: 6.456463E+00 | loss scale: 4096.0 | grad norm: 60270.862 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5473/  159576 | consumed samples:       145520 | elapsed time per iteration (ms): 16398.7 | learning rate: 4.028E-05 | global batch size:    64 | lm loss: 6.460003E+00 | loss scale: 4096.0 | grad norm: 61151.257 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5474/  159576 | consumed samples:       145584 | elapsed time per iteration (ms): 16345.5 | learning rate: 4.030E-05 | global batch size:    64 | lm loss: 6.443559E+00 | loss scale: 4096.0 | grad norm: 83130.420 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5475/  159576 | consumed samples:       145648 | elapsed time per iteration (ms): 16591.9 | learning rate: 4.031E-05 | global batch size:    64 | lm loss: 6.454519E+00 | loss scale: 4096.0 | grad norm: 69198.394 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5476/  159576 | consumed samples:       145712 | elapsed time per iteration (ms): 16643.0 | learning rate: 4.033E-05 | global batch size:    64 | lm loss: 6.424469E+00 | loss scale: 4096.0 | grad norm: 57626.523 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5477/  159576 | consumed samples:       145776 | elapsed time per iteration (ms): 16362.1 | learning rate: 4.035E-05 | global batch size:    64 | lm loss: 6.285227E+00 | loss scale: 4096.0 | grad norm: 87864.793 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5478/  159576 | consumed samples:       145840 | elapsed time per iteration (ms): 16435.9 | learning rate: 4.037E-05 | global batch size:    64 | lm loss: 6.372074E+00 | loss scale: 4096.0 | grad norm: 67542.980 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5479/  159576 | consumed samples:       145904 | elapsed time per iteration (ms): 16597.3 | learning rate: 4.038E-05 | global batch size:    64 | lm loss: 6.438199E+00 | loss scale: 4096.0 | grad norm: 74119.106 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5480/  159576 | consumed samples:       145968 | elapsed time per iteration (ms): 16483.8 | learning rate: 4.040E-05 | global batch size:    64 | lm loss: 6.487626E+00 | loss scale: 4096.0 | grad norm: 68136.689 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-25 00:07:47] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1165978_[1-10%1] on 'gpu_p13' partition)
[2021-09-25 00:07:47] PULSE: tr8-104B is running for 18:15:36 since 2021-09-24T05:52:11 (1162855_1 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8])
 iteration     5481/  159576 | consumed samples:       146032 | elapsed time per iteration (ms): 16373.0 | learning rate: 4.042E-05 | global batch size:    64 | lm loss: 6.280901E+00 | loss scale: 4096.0 | grad norm: 89214.030 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5482/  159576 | consumed samples:       146096 | elapsed time per iteration (ms): 16391.1 | learning rate: 4.044E-05 | global batch size:    64 | lm loss: 6.407492E+00 | loss scale: 4096.0 | grad norm: 71190.860 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5483/  159576 | consumed samples:       146160 | elapsed time per iteration (ms): 16510.6 | learning rate: 4.046E-05 | global batch size:    64 | lm loss: 6.338043E+00 | loss scale: 4096.0 | grad norm: 80052.869 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5484/  159576 | consumed samples:       146224 | elapsed time per iteration (ms): 16428.2 | learning rate: 4.047E-05 | global batch size:    64 | lm loss: 6.381162E+00 | loss scale: 4096.0 | grad norm: 66785.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5485/  159576 | consumed samples:       146288 | elapsed time per iteration (ms): 16390.1 | learning rate: 4.049E-05 | global batch size:    64 | lm loss: 6.377982E+00 | loss scale: 4096.0 | grad norm: 73739.230 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5486/  159576 | consumed samples:       146352 | elapsed time per iteration (ms): 16772.0 | learning rate: 4.051E-05 | global batch size:    64 | lm loss: 6.417017E+00 | loss scale: 4096.0 | grad norm: 101012.887 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5487/  159576 | consumed samples:       146416 | elapsed time per iteration (ms): 16505.3 | learning rate: 4.053E-05 | global batch size:    64 | lm loss: 6.375125E+00 | loss scale: 4096.0 | grad norm: 62796.428 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5488/  159576 | consumed samples:       146480 | elapsed time per iteration (ms): 16398.9 | learning rate: 4.054E-05 | global batch size:    64 | lm loss: 6.370068E+00 | loss scale: 4096.0 | grad norm: 53653.988 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5489/  159576 | consumed samples:       146544 | elapsed time per iteration (ms): 16369.7 | learning rate: 4.056E-05 | global batch size:    64 | lm loss: 6.376281E+00 | loss scale: 4096.0 | grad norm: 81099.504 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5490/  159576 | consumed samples:       146608 | elapsed time per iteration (ms): 16827.2 | learning rate: 4.058E-05 | global batch size:    64 | lm loss: 6.479604E+00 | loss scale: 4096.0 | grad norm: 63855.765 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5491/  159576 | consumed samples:       146672 | elapsed time per iteration (ms): 16415.6 | learning rate: 4.060E-05 | global batch size:    64 | lm loss: 6.352095E+00 | loss scale: 4096.0 | grad norm: 55122.067 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5492/  159576 | consumed samples:       146736 | elapsed time per iteration (ms): 16444.9 | learning rate: 4.062E-05 | global batch size:    64 | lm loss: 6.506047E+00 | loss scale: 4096.0 | grad norm: 75137.891 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5493/  159576 | consumed samples:       146800 | elapsed time per iteration (ms): 16342.5 | learning rate: 4.063E-05 | global batch size:    64 | lm loss: 6.379695E+00 | loss scale: 4096.0 | grad norm: 66901.698 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5494/  159576 | consumed samples:       146864 | elapsed time per iteration (ms): 16502.1 | learning rate: 4.065E-05 | global batch size:    64 | lm loss: 6.368460E+00 | loss scale: 4096.0 | grad norm: 77897.280 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5495/  159576 | consumed samples:       146928 | elapsed time per iteration (ms): 16338.1 | learning rate: 4.067E-05 | global batch size:    64 | lm loss: 6.329938E+00 | loss scale: 4096.0 | grad norm: 61931.764 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5496/  159576 | consumed samples:       146992 | elapsed time per iteration (ms): 16346.0 | learning rate: 4.069E-05 | global batch size:    64 | lm loss: 6.425272E+00 | loss scale: 4096.0 | grad norm: 66524.327 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5497/  159576 | consumed samples:       147056 | elapsed time per iteration (ms): 16765.2 | learning rate: 4.070E-05 | global batch size:    64 | lm loss: 6.296051E+00 | loss scale: 4096.0 | grad norm: 85285.961 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5498/  159576 | consumed samples:       147120 | elapsed time per iteration (ms): 16329.2 | learning rate: 4.072E-05 | global batch size:    64 | lm loss: 6.365289E+00 | loss scale: 4096.0 | grad norm: 66015.174 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5499/  159576 | consumed samples:       147184 | elapsed time per iteration (ms): 16383.4 | learning rate: 4.074E-05 | global batch size:    64 | lm loss: 6.294851E+00 | loss scale: 4096.0 | grad norm: 79758.414 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5500/  159576 | consumed samples:       147248 | elapsed time per iteration (ms): 16337.1 | learning rate: 4.076E-05 | global batch size:    64 | lm loss: 6.289442E+00 | loss scale: 4096.0 | grad norm: 74687.965 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5501/  159576 | consumed samples:       147312 | elapsed time per iteration (ms): 16790.4 | learning rate: 4.078E-05 | global batch size:    64 | lm loss: 6.322903E+00 | loss scale: 4096.0 | grad norm: 77364.060 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5502/  159576 | consumed samples:       147376 | elapsed time per iteration (ms): 16423.5 | learning rate: 4.079E-05 | global batch size:    64 | lm loss: 6.460203E+00 | loss scale: 4096.0 | grad norm: 73803.838 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5503/  159576 | consumed samples:       147440 | elapsed time per iteration (ms): 16368.8 | learning rate: 4.081E-05 | global batch size:    64 | lm loss: 6.396315E+00 | loss scale: 4096.0 | grad norm: 71129.126 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5504/  159576 | consumed samples:       147504 | elapsed time per iteration (ms): 16346.2 | learning rate: 4.083E-05 | global batch size:    64 | lm loss: 6.425894E+00 | loss scale: 4096.0 | grad norm: 98647.514 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5505/  159576 | consumed samples:       147568 | elapsed time per iteration (ms): 16678.7 | learning rate: 4.085E-05 | global batch size:    64 | lm loss: 6.381792E+00 | loss scale: 4096.0 | grad norm: 89626.671 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5506/  159576 | consumed samples:       147632 | elapsed time per iteration (ms): 16332.5 | learning rate: 4.086E-05 | global batch size:    64 | lm loss: 6.483613E+00 | loss scale: 4096.0 | grad norm: 94069.099 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5507/  159576 | consumed samples:       147696 | elapsed time per iteration (ms): 16400.4 | learning rate: 4.088E-05 | global batch size:    64 | lm loss: 6.236539E+00 | loss scale: 4096.0 | grad norm: 66871.431 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5508/  159576 | consumed samples:       147760 | elapsed time per iteration (ms): 16657.8 | learning rate: 4.090E-05 | global batch size:    64 | lm loss: 6.445796E+00 | loss scale: 4096.0 | grad norm: 79385.972 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5509/  159576 | consumed samples:       147824 | elapsed time per iteration (ms): 16347.0 | learning rate: 4.092E-05 | global batch size:    64 | lm loss: 6.421635E+00 | loss scale: 4096.0 | grad norm: 76910.947 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5510/  159576 | consumed samples:       147888 | elapsed time per iteration (ms): 16379.6 | learning rate: 4.093E-05 | global batch size:    64 | lm loss: 6.403854E+00 | loss scale: 4096.0 | grad norm: 131977.376 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5511/  159576 | consumed samples:       147952 | elapsed time per iteration (ms): 16364.3 | learning rate: 4.095E-05 | global batch size:    64 | lm loss: 6.393543E+00 | loss scale: 4096.0 | grad norm: 62655.958 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5512/  159576 | consumed samples:       148016 | elapsed time per iteration (ms): 16734.0 | learning rate: 4.097E-05 | global batch size:    64 | lm loss: 6.378099E+00 | loss scale: 4096.0 | grad norm: 71057.330 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5513/  159576 | consumed samples:       148080 | elapsed time per iteration (ms): 16360.1 | learning rate: 4.099E-05 | global batch size:    64 | lm loss: 6.439700E+00 | loss scale: 4096.0 | grad norm: 78346.761 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5514/  159576 | consumed samples:       148144 | elapsed time per iteration (ms): 16356.7 | learning rate: 4.101E-05 | global batch size:    64 | lm loss: 6.380426E+00 | loss scale: 4096.0 | grad norm: 65583.994 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5515/  159576 | consumed samples:       148208 | elapsed time per iteration (ms): 16416.2 | learning rate: 4.102E-05 | global batch size:    64 | lm loss: 6.492000E+00 | loss scale: 4096.0 | grad norm: 73724.763 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5516/  159576 | consumed samples:       148272 | elapsed time per iteration (ms): 16451.6 | learning rate: 4.104E-05 | global batch size:    64 | lm loss: 6.433869E+00 | loss scale: 4096.0 | grad norm: 93695.526 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5517/  159576 | consumed samples:       148336 | elapsed time per iteration (ms): 16367.1 | learning rate: 4.106E-05 | global batch size:    64 | lm loss: 6.316652E+00 | loss scale: 4096.0 | grad norm: 93995.663 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5518/  159576 | consumed samples:       148400 | elapsed time per iteration (ms): 16352.2 | learning rate: 4.108E-05 | global batch size:    64 | lm loss: 6.331068E+00 | loss scale: 4096.0 | grad norm: 64601.046 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5519/  159576 | consumed samples:       148464 | elapsed time per iteration (ms): 16660.3 | learning rate: 4.109E-05 | global batch size:    64 | lm loss: 6.441586E+00 | loss scale: 4096.0 | grad norm: 74837.727 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5520/  159576 | consumed samples:       148528 | elapsed time per iteration (ms): 16346.7 | learning rate: 4.111E-05 | global batch size:    64 | lm loss: 6.422507E+00 | loss scale: 4096.0 | grad norm: 57013.348 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5521/  159576 | consumed samples:       148592 | elapsed time per iteration (ms): 16378.9 | learning rate: 4.113E-05 | global batch size:    64 | lm loss: 6.388858E+00 | loss scale: 4096.0 | grad norm: 70843.138 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5522/  159576 | consumed samples:       148656 | elapsed time per iteration (ms): 16311.3 | learning rate: 4.115E-05 | global batch size:    64 | lm loss: 6.335554E+00 | loss scale: 4096.0 | grad norm: 57811.716 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5523/  159576 | consumed samples:       148720 | elapsed time per iteration (ms): 16599.0 | learning rate: 4.117E-05 | global batch size:    64 | lm loss: 6.427087E+00 | loss scale: 4096.0 | grad norm: 70169.321 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5524/  159576 | consumed samples:       148784 | elapsed time per iteration (ms): 16322.1 | learning rate: 4.118E-05 | global batch size:    64 | lm loss: 6.400644E+00 | loss scale: 4096.0 | grad norm: 65162.867 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5525/  159576 | consumed samples:       148848 | elapsed time per iteration (ms): 16352.5 | learning rate: 4.120E-05 | global batch size:    64 | lm loss: 6.476854E+00 | loss scale: 4096.0 | grad norm: 105828.693 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5526/  159576 | consumed samples:       148912 | elapsed time per iteration (ms): 16357.9 | learning rate: 4.122E-05 | global batch size:    64 | lm loss: 6.444851E+00 | loss scale: 4096.0 | grad norm: 100931.662 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5527/  159576 | consumed samples:       148976 | elapsed time per iteration (ms): 16656.2 | learning rate: 4.124E-05 | global batch size:    64 | lm loss: 6.448713E+00 | loss scale: 4096.0 | grad norm: 81570.122 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5528/  159576 | consumed samples:       149040 | elapsed time per iteration (ms): 16320.4 | learning rate: 4.125E-05 | global batch size:    64 | lm loss: 6.406240E+00 | loss scale: 4096.0 | grad norm: 82766.539 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5529/  159576 | consumed samples:       149104 | elapsed time per iteration (ms): 16353.3 | learning rate: 4.127E-05 | global batch size:    64 | lm loss: 6.376573E+00 | loss scale: 4096.0 | grad norm: 80155.568 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5530/  159576 | consumed samples:       149168 | elapsed time per iteration (ms): 16695.5 | learning rate: 4.129E-05 | global batch size:    64 | lm loss: 6.316214E+00 | loss scale: 4096.0 | grad norm: 87358.885 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5531/  159576 | consumed samples:       149232 | elapsed time per iteration (ms): 16408.8 | learning rate: 4.131E-05 | global batch size:    64 | lm loss: 6.481884E+00 | loss scale: 4096.0 | grad norm: 86550.581 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5532/  159576 | consumed samples:       149296 | elapsed time per iteration (ms): 16343.8 | learning rate: 4.133E-05 | global batch size:    64 | lm loss: 6.483734E+00 | loss scale: 4096.0 | grad norm: 89939.876 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5533/  159576 | consumed samples:       149360 | elapsed time per iteration (ms): 16370.7 | learning rate: 4.134E-05 | global batch size:    64 | lm loss: 6.318271E+00 | loss scale: 4096.0 | grad norm: 60516.549 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5534/  159576 | consumed samples:       149424 | elapsed time per iteration (ms): 16594.8 | learning rate: 4.136E-05 | global batch size:    64 | lm loss: 6.391500E+00 | loss scale: 4096.0 | grad norm: 70379.262 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5535/  159576 | consumed samples:       149488 | elapsed time per iteration (ms): 16425.6 | learning rate: 4.138E-05 | global batch size:    64 | lm loss: 6.418231E+00 | loss scale: 4096.0 | grad norm: 76225.739 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5536/  159576 | consumed samples:       149552 | elapsed time per iteration (ms): 16364.4 | learning rate: 4.140E-05 | global batch size:    64 | lm loss: 6.461292E+00 | loss scale: 4096.0 | grad norm: 117347.500 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5537/  159576 | consumed samples:       149616 | elapsed time per iteration (ms): 16683.3 | learning rate: 4.141E-05 | global batch size:    64 | lm loss: 6.394395E+00 | loss scale: 4096.0 | grad norm: 113236.928 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5538/  159576 | consumed samples:       149680 | elapsed time per iteration (ms): 16407.6 | learning rate: 4.143E-05 | global batch size:    64 | lm loss: 6.348366E+00 | loss scale: 4096.0 | grad norm: 72699.803 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5539/  159576 | consumed samples:       149744 | elapsed time per iteration (ms): 16372.4 | learning rate: 4.145E-05 | global batch size:    64 | lm loss: 6.395003E+00 | loss scale: 4096.0 | grad norm: 117054.243 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5540/  159576 | consumed samples:       149808 | elapsed time per iteration (ms): 16344.7 | learning rate: 4.147E-05 | global batch size:    64 | lm loss: 6.345469E+00 | loss scale: 4096.0 | grad norm: 66826.178 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5541/  159576 | consumed samples:       149872 | elapsed time per iteration (ms): 16658.7 | learning rate: 4.149E-05 | global batch size:    64 | lm loss: 6.311511E+00 | loss scale: 4096.0 | grad norm: 82398.862 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5542/  159576 | consumed samples:       149936 | elapsed time per iteration (ms): 16382.8 | learning rate: 4.150E-05 | global batch size:    64 | lm loss: 6.407408E+00 | loss scale: 4096.0 | grad norm: 95381.993 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5543/  159576 | consumed samples:       150000 | elapsed time per iteration (ms): 16397.3 | learning rate: 4.152E-05 | global batch size:    64 | lm loss: 6.385950E+00 | loss scale: 4096.0 | grad norm: 84966.860 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5544/  159576 | consumed samples:       150064 | elapsed time per iteration (ms): 16328.2 | learning rate: 4.154E-05 | global batch size:    64 | lm loss: 6.386173E+00 | loss scale: 4096.0 | grad norm: 76280.982 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5545/  159576 | consumed samples:       150128 | elapsed time per iteration (ms): 16536.9 | learning rate: 4.156E-05 | global batch size:    64 | lm loss: 6.429965E+00 | loss scale: 4096.0 | grad norm: 86199.770 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5546/  159576 | consumed samples:       150192 | elapsed time per iteration (ms): 16341.0 | learning rate: 4.157E-05 | global batch size:    64 | lm loss: 6.440814E+00 | loss scale: 4096.0 | grad norm: 79643.661 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5547/  159576 | consumed samples:       150256 | elapsed time per iteration (ms): 16434.5 | learning rate: 4.159E-05 | global batch size:    64 | lm loss: 6.292027E+00 | loss scale: 4096.0 | grad norm: 79649.706 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5548/  159576 | consumed samples:       150320 | elapsed time per iteration (ms): 16744.8 | learning rate: 4.161E-05 | global batch size:    64 | lm loss: 6.363777E+00 | loss scale: 4096.0 | grad norm: 105818.884 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5549/  159576 | consumed samples:       150384 | elapsed time per iteration (ms): 16446.0 | learning rate: 4.163E-05 | global batch size:    64 | lm loss: 6.525520E+00 | loss scale: 4096.0 | grad norm: 98900.365 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5550/  159576 | consumed samples:       150448 | elapsed time per iteration (ms): 16313.7 | learning rate: 4.164E-05 | global batch size:    64 | lm loss: 6.426298E+00 | loss scale: 4096.0 | grad norm: 160080.224 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5551/  159576 | consumed samples:       150512 | elapsed time per iteration (ms): 16414.2 | learning rate: 4.166E-05 | global batch size:    64 | lm loss: 6.409907E+00 | loss scale: 4096.0 | grad norm: 101291.267 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5552/  159576 | consumed samples:       150576 | elapsed time per iteration (ms): 16772.9 | learning rate: 4.168E-05 | global batch size:    64 | lm loss: 6.312022E+00 | loss scale: 4096.0 | grad norm: 93961.085 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5553/  159576 | consumed samples:       150640 | elapsed time per iteration (ms): 16393.9 | learning rate: 4.170E-05 | global batch size:    64 | lm loss: 6.460764E+00 | loss scale: 4096.0 | grad norm: 83044.555 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5554/  159576 | consumed samples:       150704 | elapsed time per iteration (ms): 16414.7 | learning rate: 4.172E-05 | global batch size:    64 | lm loss: 6.395907E+00 | loss scale: 4096.0 | grad norm: 71935.935 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5555/  159576 | consumed samples:       150768 | elapsed time per iteration (ms): 16459.3 | learning rate: 4.173E-05 | global batch size:    64 | lm loss: 6.381772E+00 | loss scale: 4096.0 | grad norm: 92358.447 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5556/  159576 | consumed samples:       150832 | elapsed time per iteration (ms): 16620.5 | learning rate: 4.175E-05 | global batch size:    64 | lm loss: 6.334103E+00 | loss scale: 4096.0 | grad norm: 135953.299 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5557/  159576 | consumed samples:       150896 | elapsed time per iteration (ms): 16420.0 | learning rate: 4.177E-05 | global batch size:    64 | lm loss: 6.350534E+00 | loss scale: 4096.0 | grad norm: 106866.155 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5558/  159576 | consumed samples:       150960 | elapsed time per iteration (ms): 16394.5 | learning rate: 4.179E-05 | global batch size:    64 | lm loss: 6.449617E+00 | loss scale: 4096.0 | grad norm: 73758.521 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5559/  159576 | consumed samples:       151024 | elapsed time per iteration (ms): 16702.3 | learning rate: 4.180E-05 | global batch size:    64 | lm loss: 6.422152E+00 | loss scale: 4096.0 | grad norm: 89216.196 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5560/  159576 | consumed samples:       151088 | elapsed time per iteration (ms): 16526.0 | learning rate: 4.182E-05 | global batch size:    64 | lm loss: 6.502412E+00 | loss scale: 4096.0 | grad norm: 75899.056 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5561/  159576 | consumed samples:       151152 | elapsed time per iteration (ms): 16388.8 | learning rate: 4.184E-05 | global batch size:    64 | lm loss: 6.353260E+00 | loss scale: 4096.0 | grad norm: 77216.880 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5562/  159576 | consumed samples:       151216 | elapsed time per iteration (ms): 16375.8 | learning rate: 4.186E-05 | global batch size:    64 | lm loss: 6.380834E+00 | loss scale: 4096.0 | grad norm: 108978.238 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5563/  159576 | consumed samples:       151280 | elapsed time per iteration (ms): 16840.5 | learning rate: 4.188E-05 | global batch size:    64 | lm loss: 6.389106E+00 | loss scale: 4096.0 | grad norm: 109665.709 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5564/  159576 | consumed samples:       151344 | elapsed time per iteration (ms): 16437.6 | learning rate: 4.189E-05 | global batch size:    64 | lm loss: 6.440452E+00 | loss scale: 4096.0 | grad norm: 455190.539 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5565/  159576 | consumed samples:       151408 | elapsed time per iteration (ms): 16403.9 | learning rate: 4.191E-05 | global batch size:    64 | lm loss: 6.425446E+00 | loss scale: 4096.0 | grad norm: 121150.795 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5566/  159576 | consumed samples:       151472 | elapsed time per iteration (ms): 16435.1 | learning rate: 4.193E-05 | global batch size:    64 | lm loss: 6.344089E+00 | loss scale: 4096.0 | grad norm: 92189.151 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5567/  159576 | consumed samples:       151536 | elapsed time per iteration (ms): 16459.4 | learning rate: 4.195E-05 | global batch size:    64 | lm loss: 6.402337E+00 | loss scale: 4096.0 | grad norm: 84995.771 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5568/  159576 | consumed samples:       151600 | elapsed time per iteration (ms): 16389.2 | learning rate: 4.196E-05 | global batch size:    64 | lm loss: 6.522965E+00 | loss scale: 4096.0 | grad norm: 82583.540 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5569/  159576 | consumed samples:       151664 | elapsed time per iteration (ms): 16371.9 | learning rate: 4.198E-05 | global batch size:    64 | lm loss: 6.357002E+00 | loss scale: 4096.0 | grad norm: 107776.266 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5570/  159576 | consumed samples:       151728 | elapsed time per iteration (ms): 16715.6 | learning rate: 4.200E-05 | global batch size:    64 | lm loss: 6.462955E+00 | loss scale: 4096.0 | grad norm: 81656.007 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5571/  159576 | consumed samples:       151792 | elapsed time per iteration (ms): 16448.5 | learning rate: 4.202E-05 | global batch size:    64 | lm loss: 6.378518E+00 | loss scale: 4096.0 | grad norm: 97168.529 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5572/  159576 | consumed samples:       151856 | elapsed time per iteration (ms): 16375.2 | learning rate: 4.204E-05 | global batch size:    64 | lm loss: 6.426227E+00 | loss scale: 4096.0 | grad norm: 138499.065 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5573/  159576 | consumed samples:       151920 | elapsed time per iteration (ms): 16391.0 | learning rate: 4.205E-05 | global batch size:    64 | lm loss: 6.467142E+00 | loss scale: 4096.0 | grad norm: 86986.159 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5574/  159576 | consumed samples:       151984 | elapsed time per iteration (ms): 16660.3 | learning rate: 4.207E-05 | global batch size:    64 | lm loss: 6.343758E+00 | loss scale: 4096.0 | grad norm: 94104.183 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5575/  159576 | consumed samples:       152048 | elapsed time per iteration (ms): 16384.3 | learning rate: 4.209E-05 | global batch size:    64 | lm loss: 6.264513E+00 | loss scale: 4096.0 | grad norm: 84463.915 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5576/  159576 | consumed samples:       152112 | elapsed time per iteration (ms): 16429.0 | learning rate: 4.211E-05 | global batch size:    64 | lm loss: 6.395695E+00 | loss scale: 4096.0 | grad norm: 91060.071 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5577/  159576 | consumed samples:       152176 | elapsed time per iteration (ms): 16399.6 | learning rate: 4.212E-05 | global batch size:    64 | lm loss: 6.322819E+00 | loss scale: 4096.0 | grad norm: 78884.092 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5578/  159576 | consumed samples:       152240 | elapsed time per iteration (ms): 16529.4 | learning rate: 4.214E-05 | global batch size:    64 | lm loss: 6.361033E+00 | loss scale: 4096.0 | grad norm: 132712.269 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5579/  159576 | consumed samples:       152304 | elapsed time per iteration (ms): 16454.4 | learning rate: 4.216E-05 | global batch size:    64 | lm loss: 6.276022E+00 | loss scale: 4096.0 | grad norm: 112417.567 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5580/  159576 | consumed samples:       152368 | elapsed time per iteration (ms): 16401.1 | learning rate: 4.218E-05 | global batch size:    64 | lm loss: 6.375633E+00 | loss scale: 4096.0 | grad norm: 85824.899 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5581/  159576 | consumed samples:       152432 | elapsed time per iteration (ms): 16688.1 | learning rate: 4.220E-05 | global batch size:    64 | lm loss: 6.447036E+00 | loss scale: 4096.0 | grad norm: 88314.135 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5582/  159576 | consumed samples:       152496 | elapsed time per iteration (ms): 16427.8 | learning rate: 4.221E-05 | global batch size:    64 | lm loss: 6.438461E+00 | loss scale: 4096.0 | grad norm: 91826.151 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5583/  159576 | consumed samples:       152560 | elapsed time per iteration (ms): 16326.4 | learning rate: 4.223E-05 | global batch size:    64 | lm loss: 6.404251E+00 | loss scale: 4096.0 | grad norm: 79746.044 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5584/  159576 | consumed samples:       152624 | elapsed time per iteration (ms): 16429.7 | learning rate: 4.225E-05 | global batch size:    64 | lm loss: 6.470784E+00 | loss scale: 4096.0 | grad norm: 78255.053 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5585/  159576 | consumed samples:       152688 | elapsed time per iteration (ms): 16577.7 | learning rate: 4.227E-05 | global batch size:    64 | lm loss: 6.352365E+00 | loss scale: 4096.0 | grad norm: 85894.611 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5586/  159576 | consumed samples:       152752 | elapsed time per iteration (ms): 16409.6 | learning rate: 4.228E-05 | global batch size:    64 | lm loss: 6.367690E+00 | loss scale: 4096.0 | grad norm: 268686.463 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5587/  159576 | consumed samples:       152816 | elapsed time per iteration (ms): 16393.7 | learning rate: 4.230E-05 | global batch size:    64 | lm loss: 6.334382E+00 | loss scale: 4096.0 | grad norm: 92996.321 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5588/  159576 | consumed samples:       152880 | elapsed time per iteration (ms): 16647.8 | learning rate: 4.232E-05 | global batch size:    64 | lm loss: 6.174354E+00 | loss scale: 4096.0 | grad norm: 99570.185 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5589/  159576 | consumed samples:       152944 | elapsed time per iteration (ms): 16470.5 | learning rate: 4.234E-05 | global batch size:    64 | lm loss: 6.349049E+00 | loss scale: 4096.0 | grad norm: 74523.491 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5590/  159576 | consumed samples:       153008 | elapsed time per iteration (ms): 16348.7 | learning rate: 4.236E-05 | global batch size:    64 | lm loss: 6.388356E+00 | loss scale: 4096.0 | grad norm: 57623.843 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5591/  159576 | consumed samples:       153072 | elapsed time per iteration (ms): 16338.9 | learning rate: 4.237E-05 | global batch size:    64 | lm loss: 6.399694E+00 | loss scale: 4096.0 | grad norm: 75852.068 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5592/  159576 | consumed samples:       153136 | elapsed time per iteration (ms): 16704.7 | learning rate: 4.239E-05 | global batch size:    64 | lm loss: 6.327959E+00 | loss scale: 4096.0 | grad norm: 69452.758 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5593/  159576 | consumed samples:       153200 | elapsed time per iteration (ms): 16334.3 | learning rate: 4.241E-05 | global batch size:    64 | lm loss: 6.435533E+00 | loss scale: 4096.0 | grad norm: 111529.645 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5594/  159576 | consumed samples:       153264 | elapsed time per iteration (ms): 16385.3 | learning rate: 4.243E-05 | global batch size:    64 | lm loss: 6.438297E+00 | loss scale: 4096.0 | grad norm: 154695.606 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5595/  159576 | consumed samples:       153328 | elapsed time per iteration (ms): 16343.1 | learning rate: 4.244E-05 | global batch size:    64 | lm loss: 6.431480E+00 | loss scale: 4096.0 | grad norm: 133987.791 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5596/  159576 | consumed samples:       153392 | elapsed time per iteration (ms): 16571.5 | learning rate: 4.246E-05 | global batch size:    64 | lm loss: 6.326744E+00 | loss scale: 4096.0 | grad norm: 65072.384 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5597/  159576 | consumed samples:       153456 | elapsed time per iteration (ms): 16304.0 | learning rate: 4.248E-05 | global batch size:    64 | lm loss: 6.450805E+00 | loss scale: 4096.0 | grad norm: 67613.081 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5598/  159576 | consumed samples:       153520 | elapsed time per iteration (ms): 16343.8 | learning rate: 4.250E-05 | global batch size:    64 | lm loss: 6.327376E+00 | loss scale: 4096.0 | grad norm: 77614.563 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5599/  159576 | consumed samples:       153584 | elapsed time per iteration (ms): 16672.4 | learning rate: 4.251E-05 | global batch size:    64 | lm loss: 6.502485E+00 | loss scale: 4096.0 | grad norm: 97568.320 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5600/  159576 | consumed samples:       153648 | elapsed time per iteration (ms): 16410.3 | learning rate: 4.253E-05 | global batch size:    64 | lm loss: 6.429380E+00 | loss scale: 4096.0 | grad norm: 84231.513 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5601/  159576 | consumed samples:       153712 | elapsed time per iteration (ms): 16391.0 | learning rate: 4.255E-05 | global batch size:    64 | lm loss: 6.436201E+00 | loss scale: 4096.0 | grad norm: 63319.618 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5602/  159576 | consumed samples:       153776 | elapsed time per iteration (ms): 16453.8 | learning rate: 4.257E-05 | global batch size:    64 | lm loss: 6.263167E+00 | loss scale: 4096.0 | grad norm: 71392.865 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5603/  159576 | consumed samples:       153840 | elapsed time per iteration (ms): 16775.3 | learning rate: 4.259E-05 | global batch size:    64 | lm loss: 6.413259E+00 | loss scale: 4096.0 | grad norm: 123761.558 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5604/  159576 | consumed samples:       153904 | elapsed time per iteration (ms): 16504.7 | learning rate: 4.260E-05 | global batch size:    64 | lm loss: 6.544505E+00 | loss scale: 4096.0 | grad norm: 83624.860 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5605/  159576 | consumed samples:       153968 | elapsed time per iteration (ms): 16306.6 | learning rate: 4.262E-05 | global batch size:    64 | lm loss: 6.452788E+00 | loss scale: 8192.0 | grad norm: 65011.271 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5606/  159576 | consumed samples:       154032 | elapsed time per iteration (ms): 16378.4 | learning rate: 4.264E-05 | global batch size:    64 | lm loss: 6.422714E+00 | loss scale: 8192.0 | grad norm: 246798.721 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5607/  159576 | consumed samples:       154096 | elapsed time per iteration (ms): 16552.8 | learning rate: 4.266E-05 | global batch size:    64 | lm loss: 6.375990E+00 | loss scale: 8192.0 | grad norm: 169739.944 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5608/  159576 | consumed samples:       154160 | elapsed time per iteration (ms): 16382.8 | learning rate: 4.267E-05 | global batch size:    64 | lm loss: 6.358736E+00 | loss scale: 8192.0 | grad norm: 157950.735 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5609/  159576 | consumed samples:       154224 | elapsed time per iteration (ms): 16422.0 | learning rate: 4.269E-05 | global batch size:    64 | lm loss: 6.444921E+00 | loss scale: 8192.0 | grad norm: 125911.826 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5610/  159576 | consumed samples:       154288 | elapsed time per iteration (ms): 9561.0 | learning rate: 4.269E-05 | global batch size:    64 | lm loss: 6.367582E+00 | loss scale: 8192.0 | grad norm: 125911.826 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5611/  159576 | consumed samples:       154352 | elapsed time per iteration (ms): 16020.4 | learning rate: 4.271E-05 | global batch size:    64 | lm loss: 6.341266E+00 | loss scale: 8192.0 | grad norm: 196277.090 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5612/  159576 | consumed samples:       154416 | elapsed time per iteration (ms): 16411.4 | learning rate: 4.273E-05 | global batch size:    64 | lm loss: 6.386235E+00 | loss scale: 8192.0 | grad norm: 174236.115 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5613/  159576 | consumed samples:       154480 | elapsed time per iteration (ms): 16406.8 | learning rate: 4.275E-05 | global batch size:    64 | lm loss: 6.302393E+00 | loss scale: 8192.0 | grad norm: 159949.232 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5614/  159576 | consumed samples:       154544 | elapsed time per iteration (ms): 16823.0 | learning rate: 4.276E-05 | global batch size:    64 | lm loss: 6.427998E+00 | loss scale: 8192.0 | grad norm: 139822.570 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5615/  159576 | consumed samples:       154608 | elapsed time per iteration (ms): 16523.9 | learning rate: 4.278E-05 | global batch size:    64 | lm loss: 6.437964E+00 | loss scale: 8192.0 | grad norm: 148561.538 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5616/  159576 | consumed samples:       154672 | elapsed time per iteration (ms): 16444.1 | learning rate: 4.280E-05 | global batch size:    64 | lm loss: 6.387279E+00 | loss scale: 8192.0 | grad norm: 165172.047 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5617/  159576 | consumed samples:       154736 | elapsed time per iteration (ms): 16455.6 | learning rate: 4.282E-05 | global batch size:    64 | lm loss: 6.365323E+00 | loss scale: 8192.0 | grad norm: 139740.137 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5618/  159576 | consumed samples:       154800 | elapsed time per iteration (ms): 16876.6 | learning rate: 4.283E-05 | global batch size:    64 | lm loss: 6.405371E+00 | loss scale: 8192.0 | grad norm: 191865.773 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5619/  159576 | consumed samples:       154864 | elapsed time per iteration (ms): 16465.6 | learning rate: 4.285E-05 | global batch size:    64 | lm loss: 6.400004E+00 | loss scale: 8192.0 | grad norm: 131301.224 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5620/  159576 | consumed samples:       154928 | elapsed time per iteration (ms): 16407.9 | learning rate: 4.287E-05 | global batch size:    64 | lm loss: 6.424757E+00 | loss scale: 8192.0 | grad norm: 152162.206 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5621/  159576 | consumed samples:       154992 | elapsed time per iteration (ms): 16429.7 | learning rate: 4.289E-05 | global batch size:    64 | lm loss: 6.415905E+00 | loss scale: 8192.0 | grad norm: 184054.677 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5622/  159576 | consumed samples:       155056 | elapsed time per iteration (ms): 16685.6 | learning rate: 4.291E-05 | global batch size:    64 | lm loss: 6.440601E+00 | loss scale: 8192.0 | grad norm: 290641.293 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5623/  159576 | consumed samples:       155120 | elapsed time per iteration (ms): 16500.9 | learning rate: 4.292E-05 | global batch size:    64 | lm loss: 6.392663E+00 | loss scale: 8192.0 | grad norm: 151394.636 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5624/  159576 | consumed samples:       155184 | elapsed time per iteration (ms): 16485.6 | learning rate: 4.294E-05 | global batch size:    64 | lm loss: 6.440325E+00 | loss scale: 8192.0 | grad norm: 132735.433 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5625/  159576 | consumed samples:       155248 | elapsed time per iteration (ms): 16832.2 | learning rate: 4.296E-05 | global batch size:    64 | lm loss: 6.382560E+00 | loss scale: 8192.0 | grad norm: 167706.666 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5626/  159576 | consumed samples:       155312 | elapsed time per iteration (ms): 16294.5 | learning rate: 4.298E-05 | global batch size:    64 | lm loss: 6.422318E+00 | loss scale: 8192.0 | grad norm: 144671.305 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5627/  159576 | consumed samples:       155376 | elapsed time per iteration (ms): 16433.6 | learning rate: 4.299E-05 | global batch size:    64 | lm loss: 6.400022E+00 | loss scale: 8192.0 | grad norm: 174837.579 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5628/  159576 | consumed samples:       155440 | elapsed time per iteration (ms): 16385.0 | learning rate: 4.301E-05 | global batch size:    64 | lm loss: 6.465958E+00 | loss scale: 8192.0 | grad norm: 167317.809 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5629/  159576 | consumed samples:       155504 | elapsed time per iteration (ms): 16829.3 | learning rate: 4.303E-05 | global batch size:    64 | lm loss: 6.365539E+00 | loss scale: 8192.0 | grad norm: 150073.382 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5630/  159576 | consumed samples:       155568 | elapsed time per iteration (ms): 16533.0 | learning rate: 4.305E-05 | global batch size:    64 | lm loss: 6.385098E+00 | loss scale: 8192.0 | grad norm: 132923.540 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5631/  159576 | consumed samples:       155632 | elapsed time per iteration (ms): 16451.7 | learning rate: 4.307E-05 | global batch size:    64 | lm loss: 6.314290E+00 | loss scale: 8192.0 | grad norm: 178222.521 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5632/  159576 | consumed samples:       155696 | elapsed time per iteration (ms): 16400.8 | learning rate: 4.308E-05 | global batch size:    64 | lm loss: 6.467572E+00 | loss scale: 8192.0 | grad norm: 147545.253 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5633/  159576 | consumed samples:       155760 | elapsed time per iteration (ms): 16566.1 | learning rate: 4.310E-05 | global batch size:    64 | lm loss: 6.341013E+00 | loss scale: 8192.0 | grad norm: 200712.657 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5634/  159576 | consumed samples:       155824 | elapsed time per iteration (ms): 16393.9 | learning rate: 4.312E-05 | global batch size:    64 | lm loss: 6.319093E+00 | loss scale: 8192.0 | grad norm: 161666.056 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5635/  159576 | consumed samples:       155888 | elapsed time per iteration (ms): 16416.9 | learning rate: 4.314E-05 | global batch size:    64 | lm loss: 6.461274E+00 | loss scale: 8192.0 | grad norm: 572124.260 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5636/  159576 | consumed samples:       155952 | elapsed time per iteration (ms): 16756.4 | learning rate: 4.315E-05 | global batch size:    64 | lm loss: 6.453969E+00 | loss scale: 8192.0 | grad norm: 205582.781 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5637/  159576 | consumed samples:       156016 | elapsed time per iteration (ms): 16349.2 | learning rate: 4.317E-05 | global batch size:    64 | lm loss: 6.386354E+00 | loss scale: 8192.0 | grad norm: 188662.234 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5638/  159576 | consumed samples:       156080 | elapsed time per iteration (ms): 16437.2 | learning rate: 4.319E-05 | global batch size:    64 | lm loss: 6.458478E+00 | loss scale: 8192.0 | grad norm: 208129.298 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5639/  159576 | consumed samples:       156144 | elapsed time per iteration (ms): 16478.4 | learning rate: 4.321E-05 | global batch size:    64 | lm loss: 6.361111E+00 | loss scale: 8192.0 | grad norm: 383224.074 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5640/  159576 | consumed samples:       156208 | elapsed time per iteration (ms): 16543.3 | learning rate: 4.322E-05 | global batch size:    64 | lm loss: 6.470639E+00 | loss scale: 8192.0 | grad norm: 244281.048 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5641/  159576 | consumed samples:       156272 | elapsed time per iteration (ms): 16418.6 | learning rate: 4.324E-05 | global batch size:    64 | lm loss: 6.453573E+00 | loss scale: 8192.0 | grad norm: 246555.042 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5642/  159576 | consumed samples:       156336 | elapsed time per iteration (ms): 16347.0 | learning rate: 4.326E-05 | global batch size:    64 | lm loss: 6.416644E+00 | loss scale: 8192.0 | grad norm: 177394.161 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5643/  159576 | consumed samples:       156400 | elapsed time per iteration (ms): 9564.0 | learning rate: 4.326E-05 | global batch size:    64 | lm loss: 6.433064E+00 | loss scale: 4096.0 | grad norm: 177394.161 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5644/  159576 | consumed samples:       156464 | elapsed time per iteration (ms): 16246.5 | learning rate: 4.328E-05 | global batch size:    64 | lm loss: 6.334921E+00 | loss scale: 4096.0 | grad norm: 91031.712 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5645/  159576 | consumed samples:       156528 | elapsed time per iteration (ms): 16410.8 | learning rate: 4.330E-05 | global batch size:    64 | lm loss: 6.398224E+00 | loss scale: 4096.0 | grad norm: 82899.277 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5646/  159576 | consumed samples:       156592 | elapsed time per iteration (ms): 16332.5 | learning rate: 4.331E-05 | global batch size:    64 | lm loss: 6.469447E+00 | loss scale: 4096.0 | grad norm: 93235.700 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5647/  159576 | consumed samples:       156656 | elapsed time per iteration (ms): 16380.9 | learning rate: 4.333E-05 | global batch size:    64 | lm loss: 6.414939E+00 | loss scale: 4096.0 | grad norm: 98498.938 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5648/  159576 | consumed samples:       156720 | elapsed time per iteration (ms): 16453.9 | learning rate: 4.335E-05 | global batch size:    64 | lm loss: 6.435335E+00 | loss scale: 4096.0 | grad norm: 110431.475 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5649/  159576 | consumed samples:       156784 | elapsed time per iteration (ms): 16375.1 | learning rate: 4.337E-05 | global batch size:    64 | lm loss: 6.367991E+00 | loss scale: 4096.0 | grad norm: 112025.804 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5650/  159576 | consumed samples:       156848 | elapsed time per iteration (ms): 16396.5 | learning rate: 4.338E-05 | global batch size:    64 | lm loss: 6.453450E+00 | loss scale: 4096.0 | grad norm: 142538.254 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5651/  159576 | consumed samples:       156912 | elapsed time per iteration (ms): 16662.1 | learning rate: 4.340E-05 | global batch size:    64 | lm loss: 6.376512E+00 | loss scale: 4096.0 | grad norm: 104884.454 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5652/  159576 | consumed samples:       156976 | elapsed time per iteration (ms): 16397.7 | learning rate: 4.342E-05 | global batch size:    64 | lm loss: 6.398083E+00 | loss scale: 4096.0 | grad norm: 97434.412 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5653/  159576 | consumed samples:       157040 | elapsed time per iteration (ms): 16367.3 | learning rate: 4.344E-05 | global batch size:    64 | lm loss: 6.468301E+00 | loss scale: 4096.0 | grad norm: 189503.731 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5654/  159576 | consumed samples:       157104 | elapsed time per iteration (ms): 16332.7 | learning rate: 4.346E-05 | global batch size:    64 | lm loss: 6.449702E+00 | loss scale: 4096.0 | grad norm: 101635.410 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5655/  159576 | consumed samples:       157168 | elapsed time per iteration (ms): 16814.3 | learning rate: 4.347E-05 | global batch size:    64 | lm loss: 6.417078E+00 | loss scale: 4096.0 | grad norm: 163445.588 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5656/  159576 | consumed samples:       157232 | elapsed time per iteration (ms): 16304.4 | learning rate: 4.349E-05 | global batch size:    64 | lm loss: 6.445296E+00 | loss scale: 4096.0 | grad norm: 90409.939 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5657/  159576 | consumed samples:       157296 | elapsed time per iteration (ms): 16400.9 | learning rate: 4.351E-05 | global batch size:    64 | lm loss: 6.445564E+00 | loss scale: 4096.0 | grad norm: 81513.234 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5658/  159576 | consumed samples:       157360 | elapsed time per iteration (ms): 16340.5 | learning rate: 4.353E-05 | global batch size:    64 | lm loss: 6.333720E+00 | loss scale: 4096.0 | grad norm: 134428.283 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5659/  159576 | consumed samples:       157424 | elapsed time per iteration (ms): 16553.5 | learning rate: 4.354E-05 | global batch size:    64 | lm loss: 6.401426E+00 | loss scale: 4096.0 | grad norm: 106022.946 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5660/  159576 | consumed samples:       157488 | elapsed time per iteration (ms): 16387.3 | learning rate: 4.356E-05 | global batch size:    64 | lm loss: 6.438997E+00 | loss scale: 4096.0 | grad norm: 83955.207 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5661/  159576 | consumed samples:       157552 | elapsed time per iteration (ms): 16456.3 | learning rate: 4.358E-05 | global batch size:    64 | lm loss: 6.402083E+00 | loss scale: 4096.0 | grad norm: 85068.294 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5662/  159576 | consumed samples:       157616 | elapsed time per iteration (ms): 16696.8 | learning rate: 4.360E-05 | global batch size:    64 | lm loss: 6.441435E+00 | loss scale: 4096.0 | grad norm: 101578.461 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5663/  159576 | consumed samples:       157680 | elapsed time per iteration (ms): 16497.3 | learning rate: 4.362E-05 | global batch size:    64 | lm loss: 6.405056E+00 | loss scale: 4096.0 | grad norm: 90814.200 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5664/  159576 | consumed samples:       157744 | elapsed time per iteration (ms): 16393.8 | learning rate: 4.363E-05 | global batch size:    64 | lm loss: 6.437488E+00 | loss scale: 4096.0 | grad norm: 99258.240 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5665/  159576 | consumed samples:       157808 | elapsed time per iteration (ms): 16464.8 | learning rate: 4.365E-05 | global batch size:    64 | lm loss: 6.461691E+00 | loss scale: 4096.0 | grad norm: 150615.188 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5666/  159576 | consumed samples:       157872 | elapsed time per iteration (ms): 16442.6 | learning rate: 4.367E-05 | global batch size:    64 | lm loss: 6.379485E+00 | loss scale: 4096.0 | grad norm: 87553.112 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5667/  159576 | consumed samples:       157936 | elapsed time per iteration (ms): 16408.0 | learning rate: 4.369E-05 | global batch size:    64 | lm loss: 6.436778E+00 | loss scale: 4096.0 | grad norm: 86837.058 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5668/  159576 | consumed samples:       158000 | elapsed time per iteration (ms): 16382.6 | learning rate: 4.370E-05 | global batch size:    64 | lm loss: 6.456222E+00 | loss scale: 4096.0 | grad norm: 81561.808 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5669/  159576 | consumed samples:       158064 | elapsed time per iteration (ms): 16606.9 | learning rate: 4.372E-05 | global batch size:    64 | lm loss: 6.394565E+00 | loss scale: 4096.0 | grad norm: 90655.669 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5670/  159576 | consumed samples:       158128 | elapsed time per iteration (ms): 16482.0 | learning rate: 4.374E-05 | global batch size:    64 | lm loss: 6.388999E+00 | loss scale: 4096.0 | grad norm: 139861.145 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5671/  159576 | consumed samples:       158192 | elapsed time per iteration (ms): 16430.2 | learning rate: 4.376E-05 | global batch size:    64 | lm loss: 6.348672E+00 | loss scale: 4096.0 | grad norm: 79933.242 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5672/  159576 | consumed samples:       158256 | elapsed time per iteration (ms): 16343.5 | learning rate: 4.378E-05 | global batch size:    64 | lm loss: 6.358377E+00 | loss scale: 4096.0 | grad norm: 91907.327 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5673/  159576 | consumed samples:       158320 | elapsed time per iteration (ms): 16738.6 | learning rate: 4.379E-05 | global batch size:    64 | lm loss: 6.397278E+00 | loss scale: 4096.0 | grad norm: 81347.015 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5674/  159576 | consumed samples:       158384 | elapsed time per iteration (ms): 16377.1 | learning rate: 4.381E-05 | global batch size:    64 | lm loss: 6.330511E+00 | loss scale: 4096.0 | grad norm: 87623.840 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5675/  159576 | consumed samples:       158448 | elapsed time per iteration (ms): 16376.8 | learning rate: 4.383E-05 | global batch size:    64 | lm loss: 6.400737E+00 | loss scale: 4096.0 | grad norm: 86243.502 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5676/  159576 | consumed samples:       158512 | elapsed time per iteration (ms): 16407.2 | learning rate: 4.385E-05 | global batch size:    64 | lm loss: 6.373343E+00 | loss scale: 4096.0 | grad norm: 112233.960 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5677/  159576 | consumed samples:       158576 | elapsed time per iteration (ms): 16504.3 | learning rate: 4.386E-05 | global batch size:    64 | lm loss: 6.340403E+00 | loss scale: 4096.0 | grad norm: 87545.481 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5678/  159576 | consumed samples:       158640 | elapsed time per iteration (ms): 16469.6 | learning rate: 4.388E-05 | global batch size:    64 | lm loss: 6.483582E+00 | loss scale: 4096.0 | grad norm: 85898.534 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5679/  159576 | consumed samples:       158704 | elapsed time per iteration (ms): 16363.2 | learning rate: 4.390E-05 | global batch size:    64 | lm loss: 6.384809E+00 | loss scale: 4096.0 | grad norm: 75822.296 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5680/  159576 | consumed samples:       158768 | elapsed time per iteration (ms): 16705.5 | learning rate: 4.392E-05 | global batch size:    64 | lm loss: 6.360985E+00 | loss scale: 4096.0 | grad norm: 93411.572 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5681/  159576 | consumed samples:       158832 | elapsed time per iteration (ms): 16533.6 | learning rate: 4.393E-05 | global batch size:    64 | lm loss: 6.346332E+00 | loss scale: 4096.0 | grad norm: 98347.186 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5682/  159576 | consumed samples:       158896 | elapsed time per iteration (ms): 16424.8 | learning rate: 4.395E-05 | global batch size:    64 | lm loss: 6.452760E+00 | loss scale: 4096.0 | grad norm: 113842.784 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5683/  159576 | consumed samples:       158960 | elapsed time per iteration (ms): 16412.1 | learning rate: 4.397E-05 | global batch size:    64 | lm loss: 6.394449E+00 | loss scale: 4096.0 | grad norm: 225192.085 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5684/  159576 | consumed samples:       159024 | elapsed time per iteration (ms): 16934.4 | learning rate: 4.399E-05 | global batch size:    64 | lm loss: 6.394941E+00 | loss scale: 4096.0 | grad norm: 81396.577 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5685/  159576 | consumed samples:       159088 | elapsed time per iteration (ms): 16454.0 | learning rate: 4.401E-05 | global batch size:    64 | lm loss: 6.261321E+00 | loss scale: 4096.0 | grad norm: 86149.759 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5686/  159576 | consumed samples:       159152 | elapsed time per iteration (ms): 16431.5 | learning rate: 4.402E-05 | global batch size:    64 | lm loss: 6.492159E+00 | loss scale: 4096.0 | grad norm: 119300.666 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5687/  159576 | consumed samples:       159216 | elapsed time per iteration (ms): 16386.6 | learning rate: 4.404E-05 | global batch size:    64 | lm loss: 6.511878E+00 | loss scale: 4096.0 | grad norm: 91338.030 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5688/  159576 | consumed samples:       159280 | elapsed time per iteration (ms): 16584.3 | learning rate: 4.406E-05 | global batch size:    64 | lm loss: 6.442091E+00 | loss scale: 4096.0 | grad norm: 127329.065 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5689/  159576 | consumed samples:       159344 | elapsed time per iteration (ms): 16414.9 | learning rate: 4.408E-05 | global batch size:    64 | lm loss: 6.445393E+00 | loss scale: 4096.0 | grad norm: 74818.326 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5690/  159576 | consumed samples:       159408 | elapsed time per iteration (ms): 16438.8 | learning rate: 4.409E-05 | global batch size:    64 | lm loss: 6.349149E+00 | loss scale: 4096.0 | grad norm: 90721.765 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5691/  159576 | consumed samples:       159472 | elapsed time per iteration (ms): 16762.3 | learning rate: 4.411E-05 | global batch size:    64 | lm loss: 6.450273E+00 | loss scale: 4096.0 | grad norm: 84948.864 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5692/  159576 | consumed samples:       159536 | elapsed time per iteration (ms): 16461.8 | learning rate: 4.413E-05 | global batch size:    64 | lm loss: 6.451497E+00 | loss scale: 4096.0 | grad norm: 160376.410 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5693/  159576 | consumed samples:       159600 | elapsed time per iteration (ms): 16376.8 | learning rate: 4.415E-05 | global batch size:    64 | lm loss: 6.414182E+00 | loss scale: 4096.0 | grad norm: 64931.477 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5694/  159576 | consumed samples:       159664 | elapsed time per iteration (ms): 16448.9 | learning rate: 4.417E-05 | global batch size:    64 | lm loss: 6.392116E+00 | loss scale: 4096.0 | grad norm: 82604.441 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5695/  159576 | consumed samples:       159728 | elapsed time per iteration (ms): 16621.3 | learning rate: 4.418E-05 | global batch size:    64 | lm loss: 6.379553E+00 | loss scale: 4096.0 | grad norm: 96286.790 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5696/  159576 | consumed samples:       159792 | elapsed time per iteration (ms): 16447.4 | learning rate: 4.420E-05 | global batch size:    64 | lm loss: 6.319911E+00 | loss scale: 4096.0 | grad norm: 113489.439 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5697/  159576 | consumed samples:       159856 | elapsed time per iteration (ms): 16402.2 | learning rate: 4.422E-05 | global batch size:    64 | lm loss: 6.538674E+00 | loss scale: 4096.0 | grad norm: 88380.285 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5698/  159576 | consumed samples:       159920 | elapsed time per iteration (ms): 16462.3 | learning rate: 4.424E-05 | global batch size:    64 | lm loss: 6.450464E+00 | loss scale: 4096.0 | grad norm: 86213.288 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5699/  159576 | consumed samples:       159984 | elapsed time per iteration (ms): 16600.9 | learning rate: 4.425E-05 | global batch size:    64 | lm loss: 6.360521E+00 | loss scale: 4096.0 | grad norm: 102839.632 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5700/  159576 | consumed samples:       160048 | elapsed time per iteration (ms): 16472.4 | learning rate: 4.427E-05 | global batch size:    64 | lm loss: 6.408503E+00 | loss scale: 4096.0 | grad norm: 93708.692 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-25 01:07:57] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1165978_[1-10%1] on 'gpu_p13' partition)
[2021-09-25 01:07:57] PULSE: tr8-104B is running for 19:15:46 since 2021-09-24T05:52:11 (1162855_1 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8])
 iteration     5701/  159576 | consumed samples:       160112 | elapsed time per iteration (ms): 16355.6 | learning rate: 4.429E-05 | global batch size:    64 | lm loss: 6.383047E+00 | loss scale: 4096.0 | grad norm: 277390.380 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5702/  159576 | consumed samples:       160176 | elapsed time per iteration (ms): 16761.7 | learning rate: 4.431E-05 | global batch size:    64 | lm loss: 6.450840E+00 | loss scale: 4096.0 | grad norm: 91541.343 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5703/  159576 | consumed samples:       160240 | elapsed time per iteration (ms): 9560.9 | learning rate: 4.431E-05 | global batch size:    64 | lm loss: 6.493016E+00 | loss scale: 2048.0 | grad norm: 91541.343 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5704/  159576 | consumed samples:       160304 | elapsed time per iteration (ms): 15881.2 | learning rate: 4.433E-05 | global batch size:    64 | lm loss: 6.513262E+00 | loss scale: 2048.0 | grad norm: 63292.650 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5705/  159576 | consumed samples:       160368 | elapsed time per iteration (ms): 16396.1 | learning rate: 4.434E-05 | global batch size:    64 | lm loss: 6.341697E+00 | loss scale: 2048.0 | grad norm: 49175.756 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5706/  159576 | consumed samples:       160432 | elapsed time per iteration (ms): 16742.1 | learning rate: 4.436E-05 | global batch size:    64 | lm loss: 6.376310E+00 | loss scale: 2048.0 | grad norm: 49500.870 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5707/  159576 | consumed samples:       160496 | elapsed time per iteration (ms): 16502.9 | learning rate: 4.438E-05 | global batch size:    64 | lm loss: 6.305195E+00 | loss scale: 2048.0 | grad norm: 66863.451 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5708/  159576 | consumed samples:       160560 | elapsed time per iteration (ms): 16427.2 | learning rate: 4.440E-05 | global batch size:    64 | lm loss: 6.338213E+00 | loss scale: 2048.0 | grad norm: 49886.489 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5709/  159576 | consumed samples:       160624 | elapsed time per iteration (ms): 16430.3 | learning rate: 4.441E-05 | global batch size:    64 | lm loss: 6.403567E+00 | loss scale: 2048.0 | grad norm: 67050.774 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5710/  159576 | consumed samples:       160688 | elapsed time per iteration (ms): 16701.6 | learning rate: 4.443E-05 | global batch size:    64 | lm loss: 6.365169E+00 | loss scale: 2048.0 | grad norm: 65553.235 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5711/  159576 | consumed samples:       160752 | elapsed time per iteration (ms): 16495.7 | learning rate: 4.445E-05 | global batch size:    64 | lm loss: 6.437389E+00 | loss scale: 2048.0 | grad norm: 42948.956 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5712/  159576 | consumed samples:       160816 | elapsed time per iteration (ms): 16396.0 | learning rate: 4.447E-05 | global batch size:    64 | lm loss: 6.359374E+00 | loss scale: 2048.0 | grad norm: 47459.652 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5713/  159576 | consumed samples:       160880 | elapsed time per iteration (ms): 16399.1 | learning rate: 4.449E-05 | global batch size:    64 | lm loss: 6.384996E+00 | loss scale: 2048.0 | grad norm: 54873.542 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5714/  159576 | consumed samples:       160944 | elapsed time per iteration (ms): 16655.8 | learning rate: 4.450E-05 | global batch size:    64 | lm loss: 6.407744E+00 | loss scale: 2048.0 | grad norm: 49484.496 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5715/  159576 | consumed samples:       161008 | elapsed time per iteration (ms): 16395.3 | learning rate: 4.452E-05 | global batch size:    64 | lm loss: 6.596529E+00 | loss scale: 2048.0 | grad norm: 56205.082 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5716/  159576 | consumed samples:       161072 | elapsed time per iteration (ms): 16464.0 | learning rate: 4.454E-05 | global batch size:    64 | lm loss: 6.421166E+00 | loss scale: 2048.0 | grad norm: 62635.742 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5717/  159576 | consumed samples:       161136 | elapsed time per iteration (ms): 16725.6 | learning rate: 4.456E-05 | global batch size:    64 | lm loss: 6.470579E+00 | loss scale: 2048.0 | grad norm: 63421.257 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5718/  159576 | consumed samples:       161200 | elapsed time per iteration (ms): 16562.5 | learning rate: 4.457E-05 | global batch size:    64 | lm loss: 6.431957E+00 | loss scale: 2048.0 | grad norm: 41629.913 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5719/  159576 | consumed samples:       161264 | elapsed time per iteration (ms): 16447.6 | learning rate: 4.459E-05 | global batch size:    64 | lm loss: 6.372540E+00 | loss scale: 2048.0 | grad norm: 52749.135 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5720/  159576 | consumed samples:       161328 | elapsed time per iteration (ms): 16436.0 | learning rate: 4.461E-05 | global batch size:    64 | lm loss: 6.376571E+00 | loss scale: 2048.0 | grad norm: 152378.164 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5721/  159576 | consumed samples:       161392 | elapsed time per iteration (ms): 16522.7 | learning rate: 4.463E-05 | global batch size:    64 | lm loss: 6.346034E+00 | loss scale: 2048.0 | grad norm: 79170.187 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5722/  159576 | consumed samples:       161456 | elapsed time per iteration (ms): 16447.7 | learning rate: 4.464E-05 | global batch size:    64 | lm loss: 6.379195E+00 | loss scale: 2048.0 | grad norm: 54035.991 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5723/  159576 | consumed samples:       161520 | elapsed time per iteration (ms): 16383.8 | learning rate: 4.466E-05 | global batch size:    64 | lm loss: 6.410875E+00 | loss scale: 2048.0 | grad norm: 122622.327 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5724/  159576 | consumed samples:       161584 | elapsed time per iteration (ms): 16762.9 | learning rate: 4.468E-05 | global batch size:    64 | lm loss: 6.426128E+00 | loss scale: 2048.0 | grad norm: 61346.953 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5725/  159576 | consumed samples:       161648 | elapsed time per iteration (ms): 16455.6 | learning rate: 4.470E-05 | global batch size:    64 | lm loss: 6.440339E+00 | loss scale: 2048.0 | grad norm: 114917.377 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5726/  159576 | consumed samples:       161712 | elapsed time per iteration (ms): 16491.5 | learning rate: 4.472E-05 | global batch size:    64 | lm loss: 6.229801E+00 | loss scale: 2048.0 | grad norm: 43861.570 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5727/  159576 | consumed samples:       161776 | elapsed time per iteration (ms): 16434.9 | learning rate: 4.473E-05 | global batch size:    64 | lm loss: 6.503794E+00 | loss scale: 2048.0 | grad norm: 59176.822 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5728/  159576 | consumed samples:       161840 | elapsed time per iteration (ms): 16686.0 | learning rate: 4.475E-05 | global batch size:    64 | lm loss: 6.415756E+00 | loss scale: 2048.0 | grad norm: 62124.293 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5729/  159576 | consumed samples:       161904 | elapsed time per iteration (ms): 16403.6 | learning rate: 4.477E-05 | global batch size:    64 | lm loss: 6.457495E+00 | loss scale: 2048.0 | grad norm: 56507.999 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5730/  159576 | consumed samples:       161968 | elapsed time per iteration (ms): 16426.6 | learning rate: 4.479E-05 | global batch size:    64 | lm loss: 6.469141E+00 | loss scale: 2048.0 | grad norm: 61746.631 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5731/  159576 | consumed samples:       162032 | elapsed time per iteration (ms): 16455.5 | learning rate: 4.480E-05 | global batch size:    64 | lm loss: 6.459309E+00 | loss scale: 2048.0 | grad norm: 59449.114 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5732/  159576 | consumed samples:       162096 | elapsed time per iteration (ms): 16649.1 | learning rate: 4.482E-05 | global batch size:    64 | lm loss: 6.402276E+00 | loss scale: 2048.0 | grad norm: 46335.687 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5733/  159576 | consumed samples:       162160 | elapsed time per iteration (ms): 16461.8 | learning rate: 4.484E-05 | global batch size:    64 | lm loss: 6.519283E+00 | loss scale: 2048.0 | grad norm: 66042.113 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5734/  159576 | consumed samples:       162224 | elapsed time per iteration (ms): 16320.8 | learning rate: 4.486E-05 | global batch size:    64 | lm loss: 6.357197E+00 | loss scale: 2048.0 | grad norm: 86317.077 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5735/  159576 | consumed samples:       162288 | elapsed time per iteration (ms): 16817.7 | learning rate: 4.488E-05 | global batch size:    64 | lm loss: 6.412820E+00 | loss scale: 2048.0 | grad norm: 68051.158 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5736/  159576 | consumed samples:       162352 | elapsed time per iteration (ms): 16374.0 | learning rate: 4.489E-05 | global batch size:    64 | lm loss: 6.409474E+00 | loss scale: 2048.0 | grad norm: 52474.381 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5737/  159576 | consumed samples:       162416 | elapsed time per iteration (ms): 16279.5 | learning rate: 4.491E-05 | global batch size:    64 | lm loss: 6.432059E+00 | loss scale: 2048.0 | grad norm: 60932.044 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5738/  159576 | consumed samples:       162480 | elapsed time per iteration (ms): 16405.5 | learning rate: 4.493E-05 | global batch size:    64 | lm loss: 6.389083E+00 | loss scale: 2048.0 | grad norm: 97554.805 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5739/  159576 | consumed samples:       162544 | elapsed time per iteration (ms): 16881.2 | learning rate: 4.495E-05 | global batch size:    64 | lm loss: 6.352797E+00 | loss scale: 2048.0 | grad norm: 56410.885 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5740/  159576 | consumed samples:       162608 | elapsed time per iteration (ms): 16465.8 | learning rate: 4.496E-05 | global batch size:    64 | lm loss: 6.400247E+00 | loss scale: 2048.0 | grad norm: 67543.254 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5741/  159576 | consumed samples:       162672 | elapsed time per iteration (ms): 16430.8 | learning rate: 4.498E-05 | global batch size:    64 | lm loss: 6.361669E+00 | loss scale: 2048.0 | grad norm: 49133.819 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5742/  159576 | consumed samples:       162736 | elapsed time per iteration (ms): 16371.1 | learning rate: 4.500E-05 | global batch size:    64 | lm loss: 6.415005E+00 | loss scale: 2048.0 | grad norm: 84089.923 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5743/  159576 | consumed samples:       162800 | elapsed time per iteration (ms): 16700.6 | learning rate: 4.502E-05 | global batch size:    64 | lm loss: 6.365685E+00 | loss scale: 2048.0 | grad norm: 51630.988 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5744/  159576 | consumed samples:       162864 | elapsed time per iteration (ms): 16325.3 | learning rate: 4.504E-05 | global batch size:    64 | lm loss: 6.440388E+00 | loss scale: 2048.0 | grad norm: 72309.287 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5745/  159576 | consumed samples:       162928 | elapsed time per iteration (ms): 16329.9 | learning rate: 4.505E-05 | global batch size:    64 | lm loss: 6.466510E+00 | loss scale: 2048.0 | grad norm: 42690.447 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5746/  159576 | consumed samples:       162992 | elapsed time per iteration (ms): 16621.4 | learning rate: 4.507E-05 | global batch size:    64 | lm loss: 6.487222E+00 | loss scale: 2048.0 | grad norm: 71804.170 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5747/  159576 | consumed samples:       163056 | elapsed time per iteration (ms): 16495.0 | learning rate: 4.509E-05 | global batch size:    64 | lm loss: 6.362286E+00 | loss scale: 2048.0 | grad norm: 86678.801 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5748/  159576 | consumed samples:       163120 | elapsed time per iteration (ms): 16346.4 | learning rate: 4.511E-05 | global batch size:    64 | lm loss: 6.356483E+00 | loss scale: 2048.0 | grad norm: 59964.749 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5749/  159576 | consumed samples:       163184 | elapsed time per iteration (ms): 16441.6 | learning rate: 4.512E-05 | global batch size:    64 | lm loss: 6.417390E+00 | loss scale: 2048.0 | grad norm: 50380.263 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5750/  159576 | consumed samples:       163248 | elapsed time per iteration (ms): 16658.5 | learning rate: 4.514E-05 | global batch size:    64 | lm loss: 6.274541E+00 | loss scale: 2048.0 | grad norm: 39059.607 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5751/  159576 | consumed samples:       163312 | elapsed time per iteration (ms): 16405.5 | learning rate: 4.516E-05 | global batch size:    64 | lm loss: 6.367218E+00 | loss scale: 2048.0 | grad norm: 51183.207 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5752/  159576 | consumed samples:       163376 | elapsed time per iteration (ms): 16320.2 | learning rate: 4.518E-05 | global batch size:    64 | lm loss: 6.344701E+00 | loss scale: 2048.0 | grad norm: 36962.457 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5753/  159576 | consumed samples:       163440 | elapsed time per iteration (ms): 16390.0 | learning rate: 4.520E-05 | global batch size:    64 | lm loss: 6.400953E+00 | loss scale: 2048.0 | grad norm: 66022.407 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5754/  159576 | consumed samples:       163504 | elapsed time per iteration (ms): 16546.1 | learning rate: 4.521E-05 | global batch size:    64 | lm loss: 6.378292E+00 | loss scale: 2048.0 | grad norm: 51492.219 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5755/  159576 | consumed samples:       163568 | elapsed time per iteration (ms): 16433.9 | learning rate: 4.523E-05 | global batch size:    64 | lm loss: 6.447009E+00 | loss scale: 2048.0 | grad norm: 67150.422 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5756/  159576 | consumed samples:       163632 | elapsed time per iteration (ms): 16359.3 | learning rate: 4.525E-05 | global batch size:    64 | lm loss: 6.393310E+00 | loss scale: 2048.0 | grad norm: 47124.929 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5757/  159576 | consumed samples:       163696 | elapsed time per iteration (ms): 16714.1 | learning rate: 4.527E-05 | global batch size:    64 | lm loss: 6.428847E+00 | loss scale: 2048.0 | grad norm: 73984.124 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5758/  159576 | consumed samples:       163760 | elapsed time per iteration (ms): 16285.5 | learning rate: 4.528E-05 | global batch size:    64 | lm loss: 6.410369E+00 | loss scale: 2048.0 | grad norm: 51894.933 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5759/  159576 | consumed samples:       163824 | elapsed time per iteration (ms): 16346.5 | learning rate: 4.530E-05 | global batch size:    64 | lm loss: 6.361977E+00 | loss scale: 2048.0 | grad norm: 46022.549 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5760/  159576 | consumed samples:       163888 | elapsed time per iteration (ms): 16363.4 | learning rate: 4.532E-05 | global batch size:    64 | lm loss: 6.411450E+00 | loss scale: 2048.0 | grad norm: 62804.958 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5761/  159576 | consumed samples:       163952 | elapsed time per iteration (ms): 16576.6 | learning rate: 4.534E-05 | global batch size:    64 | lm loss: 6.492290E+00 | loss scale: 2048.0 | grad norm: 91376.279 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5762/  159576 | consumed samples:       164016 | elapsed time per iteration (ms): 16429.0 | learning rate: 4.536E-05 | global batch size:    64 | lm loss: 6.351690E+00 | loss scale: 2048.0 | grad norm: 56460.123 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5763/  159576 | consumed samples:       164080 | elapsed time per iteration (ms): 16419.8 | learning rate: 4.537E-05 | global batch size:    64 | lm loss: 6.388021E+00 | loss scale: 2048.0 | grad norm: 48184.276 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5764/  159576 | consumed samples:       164144 | elapsed time per iteration (ms): 16346.0 | learning rate: 4.539E-05 | global batch size:    64 | lm loss: 6.500803E+00 | loss scale: 2048.0 | grad norm: 47702.715 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5765/  159576 | consumed samples:       164208 | elapsed time per iteration (ms): 16601.8 | learning rate: 4.541E-05 | global batch size:    64 | lm loss: 6.377601E+00 | loss scale: 2048.0 | grad norm: 52558.168 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5766/  159576 | consumed samples:       164272 | elapsed time per iteration (ms): 16306.8 | learning rate: 4.543E-05 | global batch size:    64 | lm loss: 6.348913E+00 | loss scale: 2048.0 | grad norm: 75335.243 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5767/  159576 | consumed samples:       164336 | elapsed time per iteration (ms): 16391.8 | learning rate: 4.544E-05 | global batch size:    64 | lm loss: 6.287434E+00 | loss scale: 2048.0 | grad norm: 51886.097 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5768/  159576 | consumed samples:       164400 | elapsed time per iteration (ms): 16644.5 | learning rate: 4.546E-05 | global batch size:    64 | lm loss: 6.409395E+00 | loss scale: 2048.0 | grad norm: 59368.924 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5769/  159576 | consumed samples:       164464 | elapsed time per iteration (ms): 16355.1 | learning rate: 4.548E-05 | global batch size:    64 | lm loss: 6.376360E+00 | loss scale: 2048.0 | grad norm: 45775.427 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5770/  159576 | consumed samples:       164528 | elapsed time per iteration (ms): 16317.3 | learning rate: 4.550E-05 | global batch size:    64 | lm loss: 6.428416E+00 | loss scale: 2048.0 | grad norm: 53234.486 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5771/  159576 | consumed samples:       164592 | elapsed time per iteration (ms): 16327.7 | learning rate: 4.551E-05 | global batch size:    64 | lm loss: 6.374567E+00 | loss scale: 2048.0 | grad norm: 44963.056 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5772/  159576 | consumed samples:       164656 | elapsed time per iteration (ms): 16674.7 | learning rate: 4.553E-05 | global batch size:    64 | lm loss: 6.357097E+00 | loss scale: 2048.0 | grad norm: 47484.231 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5773/  159576 | consumed samples:       164720 | elapsed time per iteration (ms): 16463.9 | learning rate: 4.555E-05 | global batch size:    64 | lm loss: 6.398357E+00 | loss scale: 2048.0 | grad norm: 41638.418 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5774/  159576 | consumed samples:       164784 | elapsed time per iteration (ms): 16348.7 | learning rate: 4.557E-05 | global batch size:    64 | lm loss: 6.351582E+00 | loss scale: 2048.0 | grad norm: 54903.850 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5775/  159576 | consumed samples:       164848 | elapsed time per iteration (ms): 16736.5 | learning rate: 4.559E-05 | global batch size:    64 | lm loss: 6.367338E+00 | loss scale: 2048.0 | grad norm: 43171.778 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5776/  159576 | consumed samples:       164912 | elapsed time per iteration (ms): 16420.4 | learning rate: 4.560E-05 | global batch size:    64 | lm loss: 6.386267E+00 | loss scale: 2048.0 | grad norm: 68637.095 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5777/  159576 | consumed samples:       164976 | elapsed time per iteration (ms): 16467.1 | learning rate: 4.562E-05 | global batch size:    64 | lm loss: 6.368368E+00 | loss scale: 2048.0 | grad norm: 47557.345 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5778/  159576 | consumed samples:       165040 | elapsed time per iteration (ms): 16383.6 | learning rate: 4.564E-05 | global batch size:    64 | lm loss: 6.360928E+00 | loss scale: 2048.0 | grad norm: 48661.439 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5779/  159576 | consumed samples:       165104 | elapsed time per iteration (ms): 16795.3 | learning rate: 4.566E-05 | global batch size:    64 | lm loss: 6.286585E+00 | loss scale: 2048.0 | grad norm: 41957.074 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5780/  159576 | consumed samples:       165168 | elapsed time per iteration (ms): 16414.6 | learning rate: 4.567E-05 | global batch size:    64 | lm loss: 6.329445E+00 | loss scale: 2048.0 | grad norm: 58532.760 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5781/  159576 | consumed samples:       165232 | elapsed time per iteration (ms): 16413.2 | learning rate: 4.569E-05 | global batch size:    64 | lm loss: 6.447413E+00 | loss scale: 2048.0 | grad norm: 58971.422 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5782/  159576 | consumed samples:       165296 | elapsed time per iteration (ms): 16345.1 | learning rate: 4.571E-05 | global batch size:    64 | lm loss: 6.367276E+00 | loss scale: 2048.0 | grad norm: 62853.125 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5783/  159576 | consumed samples:       165360 | elapsed time per iteration (ms): 16700.8 | learning rate: 4.573E-05 | global batch size:    64 | lm loss: 6.394166E+00 | loss scale: 2048.0 | grad norm: 104426.360 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5784/  159576 | consumed samples:       165424 | elapsed time per iteration (ms): 16276.5 | learning rate: 4.575E-05 | global batch size:    64 | lm loss: 6.447882E+00 | loss scale: 2048.0 | grad norm: 50564.392 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5785/  159576 | consumed samples:       165488 | elapsed time per iteration (ms): 16423.7 | learning rate: 4.576E-05 | global batch size:    64 | lm loss: 6.341421E+00 | loss scale: 2048.0 | grad norm: 126331.219 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5786/  159576 | consumed samples:       165552 | elapsed time per iteration (ms): 16792.0 | learning rate: 4.578E-05 | global batch size:    64 | lm loss: 6.384687E+00 | loss scale: 2048.0 | grad norm: 54058.867 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5787/  159576 | consumed samples:       165616 | elapsed time per iteration (ms): 16388.2 | learning rate: 4.580E-05 | global batch size:    64 | lm loss: 6.392807E+00 | loss scale: 2048.0 | grad norm: 59371.923 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5788/  159576 | consumed samples:       165680 | elapsed time per iteration (ms): 16392.6 | learning rate: 4.582E-05 | global batch size:    64 | lm loss: 6.457485E+00 | loss scale: 2048.0 | grad norm: 65736.175 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5789/  159576 | consumed samples:       165744 | elapsed time per iteration (ms): 16338.9 | learning rate: 4.583E-05 | global batch size:    64 | lm loss: 6.370594E+00 | loss scale: 2048.0 | grad norm: 86846.852 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5790/  159576 | consumed samples:       165808 | elapsed time per iteration (ms): 16857.0 | learning rate: 4.585E-05 | global batch size:    64 | lm loss: 6.412526E+00 | loss scale: 2048.0 | grad norm: 77325.810 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5791/  159576 | consumed samples:       165872 | elapsed time per iteration (ms): 16398.4 | learning rate: 4.587E-05 | global batch size:    64 | lm loss: 6.412295E+00 | loss scale: 2048.0 | grad norm: 50166.463 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5792/  159576 | consumed samples:       165936 | elapsed time per iteration (ms): 16290.5 | learning rate: 4.589E-05 | global batch size:    64 | lm loss: 6.380277E+00 | loss scale: 2048.0 | grad norm: 48226.590 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5793/  159576 | consumed samples:       166000 | elapsed time per iteration (ms): 16371.0 | learning rate: 4.591E-05 | global batch size:    64 | lm loss: 6.359699E+00 | loss scale: 2048.0 | grad norm: 65168.886 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5794/  159576 | consumed samples:       166064 | elapsed time per iteration (ms): 16645.3 | learning rate: 4.592E-05 | global batch size:    64 | lm loss: 6.321030E+00 | loss scale: 2048.0 | grad norm: 52186.470 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5795/  159576 | consumed samples:       166128 | elapsed time per iteration (ms): 16469.4 | learning rate: 4.594E-05 | global batch size:    64 | lm loss: 6.393083E+00 | loss scale: 2048.0 | grad norm: 55272.030 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5796/  159576 | consumed samples:       166192 | elapsed time per iteration (ms): 16425.9 | learning rate: 4.596E-05 | global batch size:    64 | lm loss: 6.374780E+00 | loss scale: 2048.0 | grad norm: 53939.279 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5797/  159576 | consumed samples:       166256 | elapsed time per iteration (ms): 16770.7 | learning rate: 4.598E-05 | global batch size:    64 | lm loss: 6.376060E+00 | loss scale: 2048.0 | grad norm: 62276.052 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5798/  159576 | consumed samples:       166320 | elapsed time per iteration (ms): 16339.0 | learning rate: 4.599E-05 | global batch size:    64 | lm loss: 6.463357E+00 | loss scale: 2048.0 | grad norm: 55276.460 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5799/  159576 | consumed samples:       166384 | elapsed time per iteration (ms): 16400.6 | learning rate: 4.601E-05 | global batch size:    64 | lm loss: 6.364144E+00 | loss scale: 2048.0 | grad norm: 46941.317 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5800/  159576 | consumed samples:       166448 | elapsed time per iteration (ms): 16328.3 | learning rate: 4.603E-05 | global batch size:    64 | lm loss: 6.412081E+00 | loss scale: 2048.0 | grad norm: 61281.255 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5801/  159576 | consumed samples:       166512 | elapsed time per iteration (ms): 16791.0 | learning rate: 4.605E-05 | global batch size:    64 | lm loss: 6.396990E+00 | loss scale: 2048.0 | grad norm: 90543.167 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5802/  159576 | consumed samples:       166576 | elapsed time per iteration (ms): 16555.9 | learning rate: 4.607E-05 | global batch size:    64 | lm loss: 6.358585E+00 | loss scale: 2048.0 | grad norm: 43097.920 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5803/  159576 | consumed samples:       166640 | elapsed time per iteration (ms): 16465.5 | learning rate: 4.608E-05 | global batch size:    64 | lm loss: 6.493999E+00 | loss scale: 2048.0 | grad norm: 45567.331 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5804/  159576 | consumed samples:       166704 | elapsed time per iteration (ms): 16436.4 | learning rate: 4.610E-05 | global batch size:    64 | lm loss: 6.533109E+00 | loss scale: 2048.0 | grad norm: 127288.085 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5805/  159576 | consumed samples:       166768 | elapsed time per iteration (ms): 16549.3 | learning rate: 4.612E-05 | global batch size:    64 | lm loss: 6.379089E+00 | loss scale: 2048.0 | grad norm: 48002.691 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5806/  159576 | consumed samples:       166832 | elapsed time per iteration (ms): 16407.1 | learning rate: 4.614E-05 | global batch size:    64 | lm loss: 6.365424E+00 | loss scale: 2048.0 | grad norm: 49891.608 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5807/  159576 | consumed samples:       166896 | elapsed time per iteration (ms): 16379.2 | learning rate: 4.615E-05 | global batch size:    64 | lm loss: 6.476014E+00 | loss scale: 2048.0 | grad norm: 47532.881 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5808/  159576 | consumed samples:       166960 | elapsed time per iteration (ms): 16753.6 | learning rate: 4.617E-05 | global batch size:    64 | lm loss: 6.354483E+00 | loss scale: 2048.0 | grad norm: 56392.704 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5809/  159576 | consumed samples:       167024 | elapsed time per iteration (ms): 16393.4 | learning rate: 4.619E-05 | global batch size:    64 | lm loss: 6.519560E+00 | loss scale: 2048.0 | grad norm: 44344.198 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5810/  159576 | consumed samples:       167088 | elapsed time per iteration (ms): 16492.5 | learning rate: 4.621E-05 | global batch size:    64 | lm loss: 6.408142E+00 | loss scale: 2048.0 | grad norm: 49620.831 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5811/  159576 | consumed samples:       167152 | elapsed time per iteration (ms): 16428.1 | learning rate: 4.622E-05 | global batch size:    64 | lm loss: 6.376643E+00 | loss scale: 2048.0 | grad norm: 54930.966 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5812/  159576 | consumed samples:       167216 | elapsed time per iteration (ms): 16603.5 | learning rate: 4.624E-05 | global batch size:    64 | lm loss: 6.446056E+00 | loss scale: 2048.0 | grad norm: 49991.934 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5813/  159576 | consumed samples:       167280 | elapsed time per iteration (ms): 16423.7 | learning rate: 4.626E-05 | global batch size:    64 | lm loss: 6.503972E+00 | loss scale: 2048.0 | grad norm: 48324.994 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5814/  159576 | consumed samples:       167344 | elapsed time per iteration (ms): 16392.6 | learning rate: 4.628E-05 | global batch size:    64 | lm loss: 6.483917E+00 | loss scale: 2048.0 | grad norm: 49344.656 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5815/  159576 | consumed samples:       167408 | elapsed time per iteration (ms): 16437.6 | learning rate: 4.630E-05 | global batch size:    64 | lm loss: 6.359298E+00 | loss scale: 2048.0 | grad norm: 46826.938 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5816/  159576 | consumed samples:       167472 | elapsed time per iteration (ms): 16791.2 | learning rate: 4.631E-05 | global batch size:    64 | lm loss: 6.477077E+00 | loss scale: 2048.0 | grad norm: 80606.642 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5817/  159576 | consumed samples:       167536 | elapsed time per iteration (ms): 16448.9 | learning rate: 4.633E-05 | global batch size:    64 | lm loss: 6.378170E+00 | loss scale: 2048.0 | grad norm: 50159.917 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5818/  159576 | consumed samples:       167600 | elapsed time per iteration (ms): 16473.7 | learning rate: 4.635E-05 | global batch size:    64 | lm loss: 6.336848E+00 | loss scale: 2048.0 | grad norm: 68729.538 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5819/  159576 | consumed samples:       167664 | elapsed time per iteration (ms): 16753.1 | learning rate: 4.637E-05 | global batch size:    64 | lm loss: 6.448166E+00 | loss scale: 2048.0 | grad norm: 53348.776 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5820/  159576 | consumed samples:       167728 | elapsed time per iteration (ms): 16453.7 | learning rate: 4.638E-05 | global batch size:    64 | lm loss: 6.433999E+00 | loss scale: 2048.0 | grad norm: 56781.530 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5821/  159576 | consumed samples:       167792 | elapsed time per iteration (ms): 16425.7 | learning rate: 4.640E-05 | global batch size:    64 | lm loss: 6.397796E+00 | loss scale: 2048.0 | grad norm: 51600.258 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5822/  159576 | consumed samples:       167856 | elapsed time per iteration (ms): 16451.4 | learning rate: 4.642E-05 | global batch size:    64 | lm loss: 6.353134E+00 | loss scale: 2048.0 | grad norm: 49519.612 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5823/  159576 | consumed samples:       167920 | elapsed time per iteration (ms): 16634.5 | learning rate: 4.644E-05 | global batch size:    64 | lm loss: 6.402969E+00 | loss scale: 2048.0 | grad norm: 52985.835 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5824/  159576 | consumed samples:       167984 | elapsed time per iteration (ms): 16465.1 | learning rate: 4.646E-05 | global batch size:    64 | lm loss: 6.411339E+00 | loss scale: 2048.0 | grad norm: 61452.738 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5825/  159576 | consumed samples:       168048 | elapsed time per iteration (ms): 16470.5 | learning rate: 4.647E-05 | global batch size:    64 | lm loss: 6.406009E+00 | loss scale: 2048.0 | grad norm: 58959.118 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5826/  159576 | consumed samples:       168112 | elapsed time per iteration (ms): 16992.0 | learning rate: 4.649E-05 | global batch size:    64 | lm loss: 6.401187E+00 | loss scale: 2048.0 | grad norm: 61581.299 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5827/  159576 | consumed samples:       168176 | elapsed time per iteration (ms): 16478.9 | learning rate: 4.651E-05 | global batch size:    64 | lm loss: 6.463509E+00 | loss scale: 2048.0 | grad norm: 62591.609 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
saving checkpoint at iteration    5827 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints
[2021-09-25 01:42:45,269] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/global_step5827/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration    5827 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints
time (ms) | save-checkpoint: 18278.37
[exiting program after 1190.1195526679357 minutes] datetime: 2021-09-25 01:42:58 
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:43:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:43:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:43:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:43:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:43:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:43:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:43:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:43:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:43:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:43:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:43:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:43:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:44:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:44:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:44:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:44:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:44:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:44:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:44:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:44:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:44:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:44:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:44:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:44:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:45:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:45:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:45:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:45:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:45:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:45:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:45:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:45:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:45:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:45:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:45:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:45:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:46:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:46:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:46:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:46:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:46:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:46:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:46:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:46:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:46:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:46:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:46:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:46:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:47:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:47:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:47:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:47:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:47:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:47:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:47:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:47:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:47:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:47:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:47:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:47:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:48:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:48:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:48:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:48:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:48:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:48:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:48:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:48:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:48:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:48:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:48:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:48:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:49:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:49:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:49:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:49:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:49:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:49:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:49:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:49:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:49:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:49:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:49:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:49:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:50:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:50:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:50:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:50:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:50:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:50:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:50:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:50:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:50:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:50:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:50:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:50:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:51:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:51:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:51:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:51:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:51:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:51:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:51:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:51:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:51:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:51:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:51:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:51:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:52:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:52:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:52:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:52:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:52:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:52:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:52:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:52:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:52:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:52:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:52:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:52:20 CEST)" to executor "default"
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs
    executor.submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job
    self._do_submit_job(job, run_times)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job
    f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
srun: Job step aborted: Waiting up to 62 seconds for job step to finish.
Killing subprocess 48057
Killing subprocess 31112
Killing subprocess 48058
Killing subprocess 48059
Killing subprocess 48061
Killing subprocess 31113
Killing subprocess 31114
Killing subprocess 31115
Main process received SIGTERM, exiting
Main process received SIGTERM, exiting
Killing subprocess 77714
Killing subprocess 28453
Killing subprocess 38924
Killing subprocess 54762
Killing subprocess 77715
Killing subprocess 77716
Killing subprocess 77717
Main process received SIGTERM, exiting
Killing subprocess 22919
Killing subprocess 28454
Killing subprocess 28455
Killing subprocess 28456
Main process received SIGTERM, exiting
Killing subprocess 54763
Killing subprocess 54764
Killing subprocess 54765
Killing subprocess 38925
Killing subprocess 38926
Killing subprocess 38927
Killing subprocess 22920
Main process received SIGTERM, exiting
Killing subprocess 22921
Killing subprocess 22923
Main process received SIGTERM, exiting
Main process received SIGTERM, exiting
Killing subprocess 19021
Killing subprocess 45366
Killing subprocess 19022
Killing subprocess 19023
Killing subprocess 19024
Main process received SIGTERM, exiting
Killing subprocess 45367
Killing subprocess 45368
Killing subprocess 45370
Main process received SIGTERM, exiting
Killing subprocess 48440
Killing subprocess 81370
Killing subprocess 48441
Killing subprocess 48442
Killing subprocess 48443
Main process received SIGTERM, exiting
Killing subprocess 81371
Killing subprocess 81372
Killing subprocess 81373
Main process received SIGTERM, exiting
Killing subprocess 65810
Killing subprocess 65811
Killing subprocess 65812
Killing subprocess 65813
Main process received SIGTERM, exiting
[2021-09-25 02:08:19] PULSE: tr8-104B is waiting to be scheduled (1165978_[1-10%1] on 'gpu_p13' partition)
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
----------------------------------------------------------------------------------------------------
--------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op report

--------------------------------------------------DeepSpeed C++/CUDA extension op report

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
JIT compiled ops requires ninja

DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja

--------------------------------------------------
JIT compiled ops requires ninja
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninjaJIT compiled ops requires ninja

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninjaJIT compiled ops requires ninja

JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninja
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninjaninja  ....................................  [92m[OKAY][0m[92m[OKAY][0m

ninjaninja  ....................................  [92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
--------------------------------------------------
ninjaninja  ....................................  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

op nameop name  ................................  installedinstalled  ....  compatiblecompatible

----------------------------------------------------------------------------------------------------

ninjaninja  .................................... [92m[OKAY][0m 
[92m[OKAY][0m--------------------------------------------------

op nameop name  ................................  installedinstalled  ....  compatiblecompatible

----------------------------------------------------------------------------------------------------

----------------------------------------------------------------------------------------------------

op nameop name  ................................  installedinstalled  ....  compatiblecompatible

op name-------------------------------------------------- ................
 installed op name..  compatible................
 --------------------------------------------------installed
 .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
cpu_adamcpu_adam  ..............................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

cpu_adamcpu_adam  ..............................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ......fused_adam  [92m[OKAY][0m............. 
[93m[NO][0m ....... [92m[OKAY][0m
fused_adamfused_adam  ..........................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

fused_adamfused_adam  ..........................  [93m[NO][0m[93m[NO][0m .......  .......[92m[OKAY][0m 
cpu_adamcpu_adam  ..............................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0mfused_adam
fused_lambfused_lamb  ..........................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

[92m[OKAY][0m
fused_adamfused_adam  ..........................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

 ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attnsparse_attn  ........................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

fused_lambfused_lamb  ..........................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

fused_lambfused_lamb  ..........................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

sparse_attn fused_lamb............  [93m[NO][0m............. .......  [93m[NO][0m[92m[OKAY][0m
 ....... transformer[92m[OKAY][0m 
............ [93m[NO][0m ....... [92m[OKAY][0m
transformer transformer............  [93m[NO][0m............  .......[93m[NO][0m  [92m[OKAY][0m.......
sparse_attnsparse_attn  ........................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

sparse_attnsparse_attn  ........................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

stochastic_transformer . [93m[NO][0m sparse_attn.......  [92m[OKAY][0m............
 [92m[OKAY][0m
transformertransformer  ........................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

transformer ............transformer  [93m[NO][0m............  .......[93m[NO][0m  [92m[OKAY][0m.......
 [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer stochastic_transformer . .[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
stochastic_transformerstochastic_transformer  ..  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

 [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer stochastic_transformer.  [93m[NO][0m ........  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
ninja .................. cpu_adam[92m[OKAY][0m 
............... --------------------------------------------------[92m[YES][0m
 ......op name  [92m[OKAY][0m................
 installed .. compatible
--------------------------------------------------
fused_adam ............. [93m[NO][0m ....... cpu_adam[92m[OKAY][0m 
............... [92m[YES][0m ......fused_lamb  [92m[OKAY][0m.............
 [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... sparse_attn[92m[OKAY][0m 
............ [93m[NO][0m fused_lamb.......  .............[92m[OKAY][0m 
[93m[NO][0m .......transformer  [92m[OKAY][0m............
 [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0msparse_attn  ...................  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
sparse_attnop name  ............................  installed[93m[NO][0m  .........  compatible
[92m[OKAY][0m--------------------------------------------------

transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ...............stochastic_transformer  [92m[YES][0m .......  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report
--------------------------------------------------
--------------------------------------------------

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report

--------------------------------------------------
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


JIT compiled ops requires ninja----------------------------------------------------------------------------------------------------
DeepSpeed C++/CUDA extension op report


JIT compiled ops requires ninjaJIT compiled ops requires ninja
--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


----------------------------------------------------------------------------------------------------

----------------------------------------------------------------------------------------------------op name

op nameop name  ................ op name................ ................ installed installed................    installed....installed    compatiblecompatible..
..
 -------------------------------------------------- compatiblecompatible
--------------------------------------------------
--------------------------------------------------


--------------------------------------------------
cpu_adamcpu_adam cpu_adam............... cpu_adam ............... ...............  [92m[YES][0m [92m[YES][0m...............  [92m[YES][0m ............  [92m[YES][0m ...... [92m[OKAY][0m [92m[OKAY][0m......
[92m[OKAY][0m

 [92m[OKAY][0m
fused_adamfused_adam fused_adamfused_adam.............    .......................................[93m[NO][0m    [93m[NO][0m[93m[NO][0m[93m[NO][0m.......   ....... ..............  [92m[OKAY][0m[92m[OKAY][0m 
[92m[OKAY][0m[92m[OKAY][0m

fused_lamb
 fused_lamb.............fused_lambfused_lamb   ............. .............[93m[NO][0m .............   [93m[NO][0m[93m[NO][0m....... [93m[NO][0m .......   [92m[OKAY][0m.......
[92m[OKAY][0m....... 
 [92m[OKAY][0m[92m[OKAY][0m

sparse_attn ............ [93m[NO][0m .......sparse_attnsparse_attn   sparse_attn........................[92m[OKAY][0m   
............[93m[NO][0m[93m[NO][0m  transformer [93m[NO][0m....... .......   [92m[OKAY][0m............[92m[OKAY][0m.......
  [93m[NO][0m
[92m[OKAY][0mtransformer
  transformer...................  transformer ............[92m[OKAY][0m[93m[NO][0m   
[93m[NO][0m...................   [93m[NO][0m[92m[OKAY][0mstochastic_transformer....... 
  .......[92m[OKAY][0m .
stochastic_transformer[92m[OKAY][0m  
[93m[NO][0m .stochastic_transformer.......  stochastic_transformer[93m[NO][0m   [92m[OKAY][0m........
  .[93m[NO][0m[92m[OKAY][0m 
 .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

------------------------------------------------------------------------------------------------------------------------------------------------------


DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------


----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report
--------------------------------------------------

JIT compiled ops requires ninja
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report--------------------------------------------------
--------------------------------------------------

DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

------------------------------------------------------------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja

--------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op report


DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja--------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report--------------------------------------------------
--------------------------------------------------
--------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
JIT compiled ops requires ninja

--------------------------------------------------
----------------------------------------------------------------------------------------------------


JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
JIT compiled ops requires ninja

--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op report--------------------------------------------------
DeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------
DeepSpeed C++/CUDA extension op report
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.----------------------------------------------------------------------------------------------------


--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja

JIT compiled ops requires ninja

--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m


------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------
op name
op nameop name   ................op name................................   installed installed ................installed .. ..   ..compatibleinstalledcompatible 

compatible --------------------------------------------------
--------------------------------------------------..

-------------------------------------------------- 
compatible
--------------------------------------------------
cpu_adamcpu_adam  ...............cpu_adam...............   [92m[YES][0m............... [92m[YES][0m ...... [92m[YES][0m ...... [92m[OKAY][0m ......cpu_adam
[92m[OKAY][0m  
[92m[OKAY][0m...............
 [92m[YES][0m ...... [92m[OKAY][0mfused_adam
 fused_adam.............  .............fused_adam[93m[NO][0m   [93m[NO][0m....................   .......[93m[NO][0m[92m[OKAY][0m  
[92m[OKAY][0m.......
 [92m[OKAY][0mfused_lamb
 .............fused_lamb  fused_adam[93m[NO][0mfused_lamb.............    .................................[93m[NO][0m   [92m[OKAY][0m[93m[NO][0m 
....... [93m[NO][0m .......[92m[OKAY][0m  
.......[92m[OKAY][0m
 [92m[OKAY][0m
fused_lambsparse_attn  .........................sparse_attn   [93m[NO][0m[93m[NO][0m ...................  sparse_attn[93m[NO][0m ....... [92m[OKAY][0m  
...................[92m[OKAY][0m  [92m[OKAY][0mtransformer[93m[NO][0m

  ...................  transformer[93m[NO][0m[92m[OKAY][0m  ............
.......  [93m[NO][0m[92m[OKAY][0m 
.......transformer  [92m[OKAY][0m............
 stochastic_transformer[93m[NO][0m  ........ stochastic_transformersparse_attn [92m[OKAY][0m  
[93m[NO][0m ....................   [93m[NO][0m[92m[OKAY][0mstochastic_transformer[93m[NO][0m 
  ....... .......[92m[OKAY][0m. 
 [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninjaninjaninja  ninja ......................................................    ..................[92m[OKAY][0m[92m[OKAY][0m
 [92m[OKAY][0m
[92m[OKAY][0m--------------------------------------------------

--------------------------------------------------

----------------------------------------------------------------------------------------------------

op nameop nameop nameop name    ................................................................   installed installedinstalledinstalled   .. .... ..   compatiblecompatiblecompatible
compatible


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


cpu_adamcpu_adam cpu_adam ............... ............... ............... [92m[YES][0m [92m[YES][0m [92m[YES][0m ...... ...... ...... [92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m
cpu_adam
 ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adamfused_adam  .............fused_adam.............   [93m[NO][0m.............[93m[NO][0m   .......[93m[NO][0m.......   [92m[OKAY][0m.......[92m[OKAY][0m
 
[92m[OKAY][0m
fused_lambfused_lamb  .............fused_lamb.............   [93m[NO][0m.............[93m[NO][0mfused_adam    [93m[NO][0m...........................   [92m[OKAY][0m .......
[92m[OKAY][0m 
[92m[OKAY][0m
[93m[NO][0m ....... [92m[OKAY][0m
fused_lambsparse_attn sparse_attn............  ............[93m[NO][0msparse_attn    .......[93m[NO][0m.........................    [93m[NO][0m[93m[NO][0m.......[92m[OKAY][0m   
..............[92m[OKAY][0m transformer
[92m[OKAY][0m  [92m[OKAY][0m............transformer
  [93m[NO][0m............
 transformer ....... [93m[NO][0m  ............[92m[OKAY][0m....... 
 [93m[NO][0m[92m[OKAY][0m 
.......stochastic_transformer  [92m[OKAY][0m
stochastic_transformer.  [93m[NO][0m .stochastic_transformer.......   [93m[NO][0m[92m[OKAY][0m 
........  [92m[OKAY][0m[93m[NO][0m
 sparse_attn.......  [92m[OKAY][0m
............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninjaninjaninjaninja    ...................................................... ..................  [92m[OKAY][0m[92m[OKAY][0m [92m[OKAY][0m

[92m[OKAY][0m

------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------
op name
op nameop name   op name................................................   installed installed................ installed  .. installed..  .. ..compatible compatible
 
compatible--------------------------------------------------compatible
--------------------------------------------------


----------------------------------------------------------------------------------------------------

cpu_adam cpu_adam...............cpu_adam   cpu_adam...............[92m[YES][0m ...............   ...............[92m[YES][0m......[92m[YES][0m    ......[92m[YES][0m[92m[OKAY][0m...... 
  [92m[OKAY][0m[92m[OKAY][0m
......
 [92m[OKAY][0m
fused_adam fused_adam.............fused_adam   [93m[NO][0m............. fused_adam .......  .............[93m[NO][0m.............[92m[OKAY][0m   
[93m[NO][0m.......[93m[NO][0m   .......[92m[OKAY][0m....... fused_lamb  
[92m[OKAY][0m[92m[OKAY][0m
.............
 [93m[NO][0mfused_lamb fused_lambfused_lamb.......    ..........................[92m[OKAY][0m 
............. [93m[NO][0m [93m[NO][0m [93m[NO][0m ..............   .......[92m[OKAY][0m[92m[OKAY][0m 

[92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
sparse_attnsparse_attnsparse_attn  transformer ....................................    [93m[NO][0m[93m[NO][0m............[93m[NO][0m    ..............[93m[NO][0m.......    [92m[OKAY][0m.......[92m[OKAY][0m

[92m[OKAY][0m 
transformertransformer[92m[OKAY][0m  
............transformer............   [93m[NO][0m............[93m[NO][0mstochastic_transformer    ..............[93m[NO][0m  [92m[OKAY][0m. [92m[OKAY][0m
 
.......[93m[NO][0m  stochastic_transformer[92m[OKAY][0m....... stochastic_transformer
  [92m[OKAY][0m.
 .stochastic_transformer[93m[NO][0m   [93m[NO][0m........  ....... [93m[NO][0m [92m[OKAY][0m [92m[OKAY][0m
.......
 [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------
--------------------------------------------------

JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------DeepSpeed C++/CUDA extension op report

JIT compiled ops requires ninja
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
ninjaninjaninjaninja   ......................................................    ..................[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m 


[92m[OKAY][0m----------------------------------------------------------------------------------------------------
--------------------------------------------------
--------------------------------------------------
JIT compiled ops requires ninja


--------------------------------------------------op nameop nameop name 
  ................................................  installedop name installed   ..installed................  .. .. installedcompatible compatiblecompatible 


--------------------------------------------------..----------------------------------------------------------------------------------------------------
 

compatible
--------------------------------------------------
cpu_adam cpu_adam............... cpu_adam cpu_adam ...............[92m[YES][0m  ..................... ............... [92m[YES][0m   [92m[YES][0m[92m[YES][0m[92m[OKAY][0m  ......
............   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


fused_adam ............. [93m[NO][0mfused_adam fused_adam.......  fused_adam ..........................[92m[OKAY][0m  
 .............[93m[NO][0m [93m[NO][0m [93m[NO][0m.......fused_lamb    [92m[OKAY][0m........................... 
  [93m[NO][0m[92m[OKAY][0m[92m[OKAY][0m 
fused_lamb
.......  fused_lamb.............[92m[OKAY][0m fused_lamb [93m[NO][0m
  .................................   [93m[NO][0m[92m[OKAY][0m[93m[NO][0m 
 ..............  [92m[OKAY][0m[92m[OKAY][0m

sparse_attn ............ [93m[NO][0m .......sparse_attn  [92m[OKAY][0m............sparse_attn
  sparse_attn[93m[NO][0m............transformer   ....... [93m[NO][0m............ ............ [92m[OKAY][0m  .......[93m[NO][0m
 [93m[NO][0m [92m[OKAY][0m .......
transformer.......  transformer............[92m[OKAY][0m   
[93m[NO][0m............[92m[OKAY][0m .......
  stochastic_transformer[92m[OKAY][0m[93m[NO][0mtransformer 
  ....................   [93m[NO][0mstochastic_transformer[93m[NO][0m [92m[OKAY][0m  
...............  [93m[NO][0m[92m[OKAY][0m [92m[OKAY][0mstochastic_transformer
 
 ....... [92m[OKAY][0m.
stochastic_transformer  [93m[NO][0m ........  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report


------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
JIT compiled ops requires ninjaJIT compiled ops requires ninja
JIT compiled ops requires ninja
JIT compiled ops requires ninja


ninjaninjaninjaninja    ........................................................................   [92m[OKAY][0m[92m[OKAY][0m 
[92m[OKAY][0m
[92m[OKAY][0m--------------------------------------------------
--------------------------------------------------


--------------------------------------------------
op nameop name-------------------------------------------------- op name ................
................   installed................installed op name  ....  installed ................compatible compatible 
..
installed ---------------------------------------------------------------------------------------------------- 
compatible
..
 compatible--------------------------------------------------

--------------------------------------------------
cpu_adamcpu_adam  ..............................  [92m[YES][0m[92m[YES][0m cpu_adam cpu_adam...... ......   ...............[92m[OKAY][0m[92m[OKAY][0m...............
 
[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
fused_adam ............. [93m[NO][0mfused_adam  ....................  fused_adam[93m[NO][0m[92m[OKAY][0m fused_adam
 .......  ..........................[92m[OKAY][0mfused_lamb  
 [93m[NO][0m[93m[NO][0m............. fused_lamb  .......[93m[NO][0m  .............. .............  [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m 


[93m[NO][0m ....... [92m[OKAY][0mfused_lamb
fused_lamb  ..........................  [93m[NO][0msparse_attn[93m[NO][0m   ..........................   [92m[OKAY][0m[93m[NO][0msparse_attn[92m[OKAY][0m  

...................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0mtransformer
 ............ transformer[93m[NO][0m  sparse_attn...................sparse_attn    [92m[OKAY][0m[93m[NO][0m........................
   .......[93m[NO][0m[93m[NO][0m   [92m[OKAY][0mstochastic_transformer..............
   [92m[OKAY][0m.[92m[OKAY][0m
stochastic_transformer 
 [93m[NO][0mtransformer transformer. .......  ............ [93m[NO][0m............ [92m[OKAY][0m [93m[NO][0m
 [93m[NO][0m .......  ..............[92m[OKAY][0m  
[92m[OKAY][0m[92m[OKAY][0m

stochastic_transformerstochastic_transformer  . .[93m[NO][0m  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
----------------------------------------------------------------------------------------------------
DeepSpeed C++/CUDA extension op report

DeepSpeed C++/CUDA extension op report--------------------------------------------------

----------------------------------------------------------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


DeepSpeed C++/CUDA extension op report----------------------------------------------------------------------------------------------------


--------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------
--------------------------------------------------

DeepSpeed C++/CUDA extension op report
JIT compiled ops requires ninja
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninja ninja ..................  .................. .................................... [92m[OKAY][0m[92m[OKAY][0m  

[92m[OKAY][0m[92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------op name
op name
  op nameop name................ ................ ................   installedinstalled................  installed.. .. installed  compatible compatible..
..
 -------------------------------------------------- 
--------------------------------------------------compatible
compatible

----------------------------------------------------------------------------------------------------

cpu_adam ............... [92m[YES][0mcpu_adam  cpu_adam.....................   cpu_adam[92m[YES][0m...............[92m[OKAY][0m  
............... ...... [92m[YES][0m  ......[92m[YES][0m [92m[OKAY][0m 
[92m[OKAY][0mfused_adam......
  .............[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m
fused_lambfused_adamfused_adam   fused_adam ....................................... ............. [93m[NO][0m   [93m[NO][0m[93m[NO][0m[93m[NO][0m.......    .............. .......[92m[OKAY][0m[92m[OKAY][0m 
 
[92m[OKAY][0m[92m[OKAY][0m

fused_lambfused_lamb  fused_lamb..........................   .............[93m[NO][0m[93m[NO][0m   .......[93m[NO][0m.......   sparse_attn[92m[OKAY][0m.......[92m[OKAY][0m
  
[92m[OKAY][0m............
 [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0msparse_attn sparse_attn ....... sparse_attn............ ............ [92m[OKAY][0m  [93m[NO][0m............
[93m[NO][0m   ..............[93m[NO][0m stochastic_transformer [92m[OKAY][0m  [92m[OKAY][0m
.......
.  [92m[OKAY][0mtransformer[93m[NO][0m  transformer
...................   [93m[NO][0mtransformer............ [92m[OKAY][0m ....... ............
 [93m[NO][0m [92m[OKAY][0m [93m[NO][0m
.......  .......[92m[OKAY][0m 
[92m[OKAY][0mstochastic_transformer
 stochastic_transformer. stochastic_transformer  [93m[NO][0m.  ........[93m[NO][0m   [92m[OKAY][0m[93m[NO][0m.......
  .......[92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report


----------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------

--------------------------------------------------
--------------------------------------------------
--------------------------------------------------JIT compiled ops requires ninja

JIT compiled ops requires ninja
JIT compiled ops requires ninja
JIT compiled ops requires ninja

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

--------------------------------------------------JIT compiled ops requires ninja--------------------------------------------------


JIT compiled ops requires ninja
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja  ..................   ......................................................[92m[OKAY][0m  
[92m[OKAY][0m [92m[OKAY][0m--------------------------------------------------
[92m[OKAY][0m


----------------------------------------------------------------------------------------------------
op name--------------------------------------------------
 op name
op name................ op name  ................ ................installed  ................ installed.. installed  ..installed compatible  ..
.. compatible-------------------------------------------------- compatible

compatible
--------------------------------------------------

----------------------------------------------------------------------------------------------------

cpu_adam ............... [92m[YES][0m cpu_adam......cpu_adamcpu_adam    ...............[92m[OKAY][0m............... ...............
[92m[YES][0m   [92m[YES][0m[92m[YES][0m......   ............[92m[OKAY][0m  
[92m[OKAY][0m[92m[OKAY][0mfused_adam

 ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_adamfused_lamb  ..........................  fused_adam[93m[NO][0m[93m[NO][0mfused_adam    ........................... ............. [93m[NO][0m [92m[OKAY][0m  [93m[NO][0m [92m[OKAY][0m
..............
  [92m[OKAY][0m[92m[OKAY][0m

fused_lamb ............. fused_lambfused_lamb[93m[NO][0m   .............sparse_attn....................   [93m[NO][0m ............[92m[OKAY][0m  [93m[NO][0m.......
[93m[NO][0m   [92m[OKAY][0m..............
  [92m[OKAY][0m[92m[OKAY][0m

transformer sparse_attn............  ............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m sparse_attn[92m[OKAY][0m
 
sparse_attn............  transformerstochastic_transformer............[93m[NO][0m    ...................[93m[NO][0m.    [93m[NO][0m.......[93m[NO][0m[92m[OKAY][0m   .......[92m[OKAY][0m.......
 
 [92m[OKAY][0m[92m[OKAY][0m

transformertransformer  ........................stochastic_transformer  [93m[NO][0m  [93m[NO][0m....... . ....... [92m[OKAY][0m [93m[NO][0m
[92m[OKAY][0m 
....... [92m[OKAY][0m
stochastic_transformer stochastic_transformer ..  [93m[NO][0m[93m[NO][0m  ....... .......[92m[OKAY][0m 
[92m[OKAY][0m
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


op nameop nameop name op name ................  ................................   installed................installed  installed installed.. ..  .. compatible.. compatible
 compatible
compatible--------------------------------------------------
--------------------------------------------------

--------------------------------------------------

--------------------------------------------------
cpu_adamcpu_adam cpu_adam............... cpu_adam  ...............[92m[YES][0m ...............  ............... ...... [92m[YES][0m[92m[YES][0m  [92m[YES][0m......  [92m[OKAY][0m ............[92m[OKAY][0m 
 
[92m[OKAY][0m[92m[OKAY][0m

fused_adam fused_adamfused_adam.............fused_adam    ..........................[93m[NO][0m.............  [93m[NO][0m  ....... [93m[NO][0m[93m[NO][0m .......   [92m[OKAY][0m[92m[OKAY][0m..............

  [92m[OKAY][0m[92m[OKAY][0m
fused_lamb
fused_lamb  ..........................fused_lambfused_lamb    [93m[NO][0m[93m[NO][0m..........................    [93m[NO][0m....... .......[93m[NO][0m  ....... [92m[OKAY][0m [92m[OKAY][0m.......

[92m[OKAY][0m [92m[OKAY][0m

sparse_attn sparse_attn............ sparse_attnsparse_attn ............ [93m[NO][0m............    [93m[NO][0m...................[93m[NO][0m    [92m[OKAY][0m.......[93m[NO][0m.......
   [92m[OKAY][0m[92m[OKAY][0m.......
transformer
  transformer[92m[OKAY][0mtransformer............ 
  ........................[93m[NO][0m  transformer[93m[NO][0m [93m[NO][0m  ..........................    .......[93m[NO][0m[92m[OKAY][0m [92m[OKAY][0m 
[92m[OKAY][0m
.......
 [92m[OKAY][0mstochastic_transformer
stochastic_transformer stochastic_transformer  ..stochastic_transformer  . [93m[NO][0m[93m[NO][0m  .  [93m[NO][0m[93m[NO][0m..............    ..............[92m[OKAY][0m[92m[OKAY][0m  

[92m[OKAY][0m[92m[OKAY][0m

ninjaninjaninjaninja   ......................................................   [92m[OKAY][0m [92m[OKAY][0m..................
[92m[OKAY][0m
 ----------------------------------------------------------------------------------------------------
[92m[OKAY][0m

--------------------------------------------------
op name
op name--------------------------------------------------op name  
................ ................................op name    installedinstalled................installed    ....installed..    compatiblecompatiblecompatible
..

-------------------------------------------------- --------------------------------------------------compatible

--------------------------------------------------

--------------------------------------------------
cpu_adam cpu_adam............... cpu_adam ............... [92m[YES][0m cpu_adam[92m[YES][0m...............  ......  ...... ............... [92m[OKAY][0m 
[92m[YES][0m[92m[OKAY][0m[92m[YES][0m 
 ...... ......[92m[OKAY][0m 
[92m[OKAY][0m
fused_adam fused_adam.............  .............[93m[NO][0m  [93m[NO][0mfused_adamfused_adam.......    ....................[92m[OKAY][0m.............  
 [92m[OKAY][0m[93m[NO][0m
 [93m[NO][0mfused_lamb.......   ....................fused_lamb [92m[OKAY][0m[93m[NO][0m
   ....................fused_lamb[92m[OKAY][0m   
[92m[OKAY][0m.............[93m[NO][0m
  fused_lamb[93m[NO][0m.......   .............[92m[OKAY][0m....... 
 [93m[NO][0m[92m[OKAY][0m 
....... sparse_attn[92m[OKAY][0m 
............ [93m[NO][0m .......sparse_attn  [92m[OKAY][0m............
 [93m[NO][0msparse_attn transformer .......  ........................[92m[OKAY][0m sparse_attn [93m[NO][0m
[93m[NO][0m   .......transformer....... ........................    [92m[OKAY][0m[92m[OKAY][0m[93m[NO][0m[93m[NO][0m

  transformer..............   stochastic_transformer............[92m[OKAY][0m[92m[OKAY][0m  

.[93m[NO][0m  stochastic_transformertransformer[93m[NO][0m.......    ....................[92m[OKAY][0m  
 [92m[OKAY][0m[93m[NO][0m
[93m[NO][0m  stochastic_transformer..............   [92m[OKAY][0m[92m[OKAY][0m.

 [93m[NO][0m ....... stochastic_transformer[92m[OKAY][0m 
. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op report--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report

DeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------

--------------------------------------------------
----------------------------------------------------------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
JIT compiled ops requires ninja
JIT compiled ops requires ninja
JIT compiled ops requires ninja

ninjaninjaninjaninja   .................. .................. .................. .................. [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m
 

[92m[OKAY][0m----------------------------------------------------------------------------------------------------
--------------------------------------------------


op nameop name --------------------------------------------------................op name 
  installedop name................ ................ ..  ................ installedinstalledcompatible   
..installed..  -------------------------------------------------- compatible
compatible
--------------------------------------------------..

 --------------------------------------------------compatible

--------------------------------------------------
cpu_adam cpu_adamcpu_adam...............   cpu_adam...............[92m[YES][0m ...............  ............... [92m[YES][0m......[92m[YES][0m    [92m[YES][0m............[92m[OKAY][0m 
  ......[92m[OKAY][0m[92m[OKAY][0m 

[92m[OKAY][0m
fused_adam ............. [93m[NO][0m fused_adam.......fused_adamfused_adam    .............[92m[OKAY][0m..........................  
[93m[NO][0m [93m[NO][0m [93m[NO][0mfused_lamb .......   .............[92m[OKAY][0m..............
   [93m[NO][0m[92m[OKAY][0m [92m[OKAY][0m
fused_lamb
 .................... fused_lamb [92m[OKAY][0mfused_lamb [93m[NO][0m
 ............. ............. ....... [93m[NO][0m [93m[NO][0m [92m[OKAY][0m ..............
  [92m[OKAY][0m[92m[OKAY][0m

sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn sparse_attntransformer............sparse_attn    ............[93m[NO][0m............ ............[93m[NO][0m    .......[93m[NO][0m....... [93m[NO][0m [92m[OKAY][0m  .......
[92m[OKAY][0m....... 
 [92m[OKAY][0m[92m[OKAY][0m

transformer transformerstochastic_transformer ............transformer ............   [93m[NO][0m[93m[NO][0m ....................    [93m[NO][0m.......[92m[OKAY][0m[93m[NO][0m 
 .......  .......[92m[OKAY][0m[92m[OKAY][0m 
[92m[OKAY][0m
stochastic_transformer
 .stochastic_transformerstochastic_transformer   [93m[NO][0m .........   [93m[NO][0m[93m[NO][0m[92m[OKAY][0m  
..............  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

--------------------------------------------------
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------


JIT compiled ops requires ninja--------------------------------------------------
DeepSpeed C++/CUDA extension op report

JIT compiled ops requires ninja
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja    ........................................................................  [92m[OKAY][0m  [92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m

----------------------------------------------------------------------------------------------------
--------------------------------------------------
--------------------------------------------------

op name
op name op nameop name  ................  ................................................installed    installedinstalled.. installed   ....compatible..  
compatible 
compatible--------------------------------------------------
compatible----------------------------------------------------------------------------------------------------


--------------------------------------------------
cpu_adamcpu_adam cpu_adam ............... cpu_adam ...............[92m[YES][0m...............    .....................[92m[YES][0m[92m[YES][0m    [92m[OKAY][0m......[92m[YES][0m ......
......  [92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m

fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0mfused_adam
fused_adamfused_adam   fused_lamb.......................................  .............  [93m[NO][0m [93m[NO][0m[93m[NO][0m [93m[NO][0m   ............................   [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


fused_lambfused_lamb  fused_lamb..........................   .............[93m[NO][0m[93m[NO][0m   [93m[NO][0m..............   .......[92m[OKAY][0m[92m[OKAY][0msparse_attn 

 [92m[OKAY][0m............
 [93m[NO][0m ....... [92m[OKAY][0m
transformersparse_attn sparse_attn ............ ............sparse_attn ............ [93m[NO][0m   [93m[NO][0m[93m[NO][0m...................   ....... ....... [93m[NO][0m[92m[OKAY][0m[92m[OKAY][0m 

 .......[92m[OKAY][0m 
[92m[OKAY][0mstochastic_transformer
transformer  .............transformer  [93m[NO][0m  [93m[NO][0mtransformer...................    ...................[93m[NO][0m[92m[OKAY][0m 
  .......[93m[NO][0m[92m[OKAY][0m stochastic_transformer 
 [92m[OKAY][0m.......
 [92m[OKAY][0m.
 stochastic_transformer[93m[NO][0m  .......stochastic_transformer .  [92m[OKAY][0m[93m[NO][0m
 ........  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
--------------------------------------------------
----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report


--------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------
--------------------------------------------------

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
JIT compiled ops requires ninja
--------------------------------------------------
--------------------------------------------------

--------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja


DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
----------------------------------------------------------------------------------------------------
--------------------------------------------------
JIT compiled ops requires ninja

DeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja--------------------------------------------------

JIT compiled ops requires ninja
----------------------------------------------------------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report--------------------------------------------------
DeepSpeed C++/CUDA extension op report

DeepSpeed C++/CUDA extension op report
--------------------------------------------------DeepSpeed C++/CUDA extension op report


----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------
JIT compiled ops requires ninja--------------------------------------------------
--------------------------------------------------

JIT compiled ops requires ninja
JIT compiled ops requires ninja
JIT compiled ops requires ninja

--------------------------------------------------
DeepSpeed C++/CUDA extension op report--------------------------------------------------
----------------------------------------------------------------------------------------------------


DeepSpeed C++/CUDA extension op report
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report
--------------------------------------------------
--------------------------------------------------
----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
DeepSpeed C++/CUDA extension op report
--------------------------------------------------

----------------------------------------------------------------------------------------------------JIT compiled ops requires ninja


JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


op nameop nameop nameop name   ................ ................................   ................installedinstalled   installedinstalled.. .. ..  .. compatiblecompatible compatible

compatible

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
cpu_adamcpu_adamcpu_adamcpu_adam    ............................................. ...............[92m[YES][0m    [92m[YES][0m[92m[YES][0m[92m[YES][0m......    ..................[92m[OKAY][0m   
[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. fused_adam[93m[NO][0mfused_adam fused_adam .............  ............. .......[93m[NO][0m  .............[93m[NO][0m .......  [92m[OKAY][0m [93m[NO][0m.......
[92m[OKAY][0m  
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
.......[92m[OKAY][0m fused_lamb
[92m[OKAY][0m fused_lamb
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
............. fused_lamb ............. fused_lamb.............[93m[NO][0m    [93m[NO][0m[93m[NO][0m....................  .......  .......[92m[OKAY][0m[93m[NO][0m  
--------------------------------------------------
 [92m[OKAY][0m[92m[OKAY][0m.......

 [92m[OKAY][0m
sparse_attn ............ sparse_attn[93m[NO][0m  sparse_attn.......sparse_attn............    ............[93m[NO][0m[92m[OKAY][0m ................... 
  [93m[NO][0m[93m[NO][0mtransformer[92m[OKAY][0m   
..........................   transformer[92m[OKAY][0m[92m[OKAY][0m[93m[NO][0m
  .......transformer
............   [92m[OKAY][0m............[93m[NO][0mtransformer
   [93m[NO][0m................... ....... stochastic_transformer  [92m[OKAY][0m [93m[NO][0m[92m[OKAY][0m
 
........  stochastic_transformer[92m[OKAY][0m[93m[NO][0m stochastic_transformer
  ........  .[93m[NO][0m[92m[OKAY][0m stochastic_transformer 
[93m[NO][0m .......  ........[92m[OKAY][0m  
[93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
ninjaninjaninjaninja  ....................................  ..................   ..................[92m[OKAY][0m[92m[OKAY][0m [92m[OKAY][0m


--------------------------------------------------[92m[OKAY][0m--------------------------------------------------
--------------------------------------------------


op nameop name ................--------------------------------------------------op name   
................installed................   op nameinstalled.. installed ................  .. .. compatibleinstalled compatiblecompatible 


..---------------------------------------------------------------------------------------------------- --------------------------------------------------

compatible

--------------------------------------------------
cpu_adamcpu_adam cpu_adam cpu_adam...............  ............... .............................. [92m[YES][0m   [92m[YES][0m[92m[YES][0m[92m[YES][0m......    ............ ......[92m[OKAY][0m[92m[OKAY][0m 
 [92m[OKAY][0m
[92m[OKAY][0m

fused_adamfused_adam fused_adamfused_adam.............    .............[93m[NO][0m  .............[93m[NO][0m....................    [92m[OKAY][0m[93m[NO][0m....... [93m[NO][0m ....... 
[92m[OKAY][0m....... 
 [92m[OKAY][0mfused_lamb[92m[OKAY][0m
 
.............fused_lamb fused_lamb[93m[NO][0m  fused_lamb .................................    .............[92m[OKAY][0m[93m[NO][0m[93m[NO][0m
   [93m[NO][0m..............   .......[92m[OKAY][0m[92m[OKAY][0m 

[92m[OKAY][0m
sparse_attn ............ [93m[NO][0m sparse_attnsparse_attn.......sparse_attn    ........................[92m[OKAY][0m............   
[93m[NO][0m[93m[NO][0m[93m[NO][0m   .....................transformer   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m 
............

 transformer[93m[NO][0mtransformertransformer    ...........................................   [93m[NO][0m [93m[NO][0m[92m[OKAY][0m[93m[NO][0m
  ....... ....... ....... [92m[OKAY][0m stochastic_transformer[92m[OKAY][0m 
[92m[OKAY][0m

. stochastic_transformer[93m[NO][0mstochastic_transformerstochastic_transformer   .......  ..[92m[OKAY][0m.  
 [93m[NO][0m[93m[NO][0m  [93m[NO][0m....... ....... ....... [92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
----------------------------------------------------------------------------------------------------

--------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report


--------------------------------------------------DeepSpeed C++/CUDA extension op report----------------------------------------------------------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------DeepSpeed C++/CUDA extension op report


--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.----------------------------------------------------------------------------------------------------


JIT compiled ops requires ninja--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja


--------------------------------------------------JIT compiled ops requires ninja

JIT compiled ops requires ninja
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninjaninjaninjaninja   .................. ......................................................   [92m[OKAY][0m[92m[OKAY][0m [92m[OKAY][0m

[92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
----------------------------------------------------------------------------------------------------

--------------------------------------------------
op name
async_io ............... [93m[NO][0m ....... [93m[NO][0m
op name op name ................op name ................  ................ ................installedinstalled   installed ..installed ..  .. .. compatible compatiblecompatible
compatible

--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------

transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
cpu_adam cpu_adamcpu_adam...............  cpu_adam ............... ..............................[92m[YES][0m  [92m[YES][0m  [92m[YES][0m......[92m[YES][0m    ............[92m[OKAY][0m ...... [92m[OKAY][0m 
[92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
[92m[OKAY][0m

--------------------------------------------------
fused_adamfused_adam  .............fused_adam............. fused_adam   [93m[NO][0m.............[93m[NO][0m   .......[93m[NO][0m .................... [92m[OKAY][0m.......  
 [93m[NO][0m[92m[OKAY][0m[92m[OKAY][0m 
fused_lamb
.......  fused_lamb.............[92m[OKAY][0m  fused_lamb
.............[93m[NO][0m   [93m[NO][0m....... .............fused_lamb.......  [93m[NO][0m  [92m[OKAY][0m [92m[OKAY][0m.............

.......  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
sparse_attnsparse_attn  ........................sparse_attn   sparse_attn[93m[NO][0m[93m[NO][0m............  .......  ...................  [92m[OKAY][0m[93m[NO][0m[92m[OKAY][0m 
[93m[NO][0m  
transformer..............  ............ transformer[92m[OKAY][0m [92m[OKAY][0m 

[93m[NO][0m............  .......[93m[NO][0mtransformertransformer    [92m[OKAY][0m...............................
   [93m[NO][0m[92m[OKAY][0m [93m[NO][0mstochastic_transformer
.......   .......stochastic_transformer.[92m[OKAY][0m   
[93m[NO][0m[92m[OKAY][0m. 
.......  [93m[NO][0mstochastic_transformer[92m[OKAY][0mstochastic_transformer  .......
  .[92m[OKAY][0m. 
 [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

ninjaninjaninjaninja    ........................................................................   [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m

--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
op name
-------------------------------------------------- op name
op name................  op name................  ................installed   installedinstalled..................    ..compatibleinstalled.. 
  compatiblecompatible--------------------------------------------------..


 ----------------------------------------------------------------------------------------------------compatible


--------------------------------------------------
cpu_adam ...............cpu_adam cpu_adam [92m[YES][0mcpu_adam ...............   .....................[92m[YES][0m...............  [92m[YES][0m  [92m[YES][0m [92m[OKAY][0m...... 
............  [92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m

fused_adam ............. [93m[NO][0m ....... fused_adam[92m[OKAY][0m fused_adam
.............fused_adam   [93m[NO][0m............. fused_lamb....... .............  [93m[NO][0m .............[92m[OKAY][0m [93m[NO][0m 
....... [93m[NO][0m ....... fused_lamb .......[92m[OKAY][0m [92m[OKAY][0m 
.............
[92m[OKAY][0m 
[93m[NO][0m .......fused_lamb fused_lamb[92m[OKAY][0m  
..........................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0msparse_attn

 ............ [93m[NO][0m .......sparse_attn  [92m[OKAY][0m............
 [93m[NO][0m ....... sparse_attnsparse_attntransformer[92m[OKAY][0m   
.................................... [93m[NO][0m transformer[93m[NO][0m    [93m[NO][0m................... .......  .......[93m[NO][0m  .......[92m[OKAY][0m[92m[OKAY][0m  

[92m[OKAY][0m[92m[OKAY][0m

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
transformerstochastic_transformer  transformer............stochastic_transformer.   ............ [93m[NO][0m. [93m[NO][0m  [93m[NO][0m [93m[NO][0m....... .......   ..............[92m[OKAY][0m[92m[OKAY][0m
  
[92m[OKAY][0m[92m[OKAY][0m

async_io ............... [93m[NO][0m ....... [93m[NO][0m
stochastic_transformer stochastic_transformer.  [93m[NO][0m.  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninjaninjaninjaninja   .................. .................................... ..................  [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------
op name
op name op name op name ................................................   installed ................installed  installed installed ......    ..compatiblecompatiblecompatible

 
--------------------------------------------------compatible--------------------------------------------------
--------------------------------------------------


--------------------------------------------------
cpu_adam cpu_adam...............cpu_adam   ...............cpu_adam............... [92m[YES][0m  [92m[YES][0m ............... [92m[YES][0m...... ......   [92m[YES][0m......[92m[OKAY][0m[92m[OKAY][0m 
 ......
[92m[OKAY][0m 
[92m[OKAY][0m
fused_adam ............. fused_adam[93m[NO][0m fused_adam.............  fused_adam ....... .............[93m[NO][0m .............   .......[92m[OKAY][0m[93m[NO][0m[93m[NO][0m 
  [92m[OKAY][0m.......fused_lamb.......
   [92m[OKAY][0m[92m[OKAY][0m.............fused_lamb
 
 [93m[NO][0m.............fused_lambfused_lamb    [93m[NO][0m.................... .............[93m[NO][0m    .......[93m[NO][0m[92m[OKAY][0m .......
 [92m[OKAY][0m .......
[92m[OKAY][0m 
[92m[OKAY][0m
sparse_attn sparse_attn............sparse_attn   sparse_attn........................[93m[NO][0m   ............[93m[NO][0m  [93m[NO][0m .......[93m[NO][0m .......  ....... .......[92m[OKAY][0m 
[92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m

transformer ............transformertransformer transformer   ........................[93m[NO][0m............  [93m[NO][0m  [93m[NO][0m ....... [93m[NO][0m....... .......  [92m[OKAY][0m....... [92m[OKAY][0m 

[92m[OKAY][0m[92m[OKAY][0m

stochastic_transformer stochastic_transformerstochastic_transformer .stochastic_transformer  . [93m[NO][0m . [93m[NO][0m. .......   [93m[NO][0m[93m[NO][0m[92m[OKAY][0m....... 
  ..............[92m[OKAY][0m  
[92m[OKAY][0m[92m[OKAY][0m

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
----------------------------------------------------------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report

DeepSpeed C++/CUDA extension op report----------------------------------------------------------------------------------------------------

--------------------------------------------------

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report


--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------
--------------------------------------------------

JIT compiled ops requires ninja
JIT compiled ops requires ninja
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------


--------------------------------------------------JIT compiled ops requires ninja
JIT compiled ops requires ninja

------------------------------------------------------------------------------------------------------------------------------------------------------


--------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report


DeepSpeed C++/CUDA extension op report----------------------------------------------------------------------------------------------------
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja--------------------------------------------------


JIT compiled ops requires ninja--------------------------------------------------
JIT compiled ops requires ninja

JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
----------------------------------------------------------------------------------------------------
JIT compiled ops requires ninja

DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m

--------------------------------------------------

----------------------------------------------------------------------------------------------------
--------------------------------------------------op name

 op name................ op name................op name   installed  installed.................. ................ ..  compatibleinstalled 
installed compatible..-------------------------------------------------- 
 
..--------------------------------------------------compatible 

compatible
--------------------------------------------------
--------------------------------------------------
cpu_adamcpu_adam  ..............................  [92m[YES][0mcpu_adam[92m[YES][0m cpu_adam  ...........................    [92m[OKAY][0m...............[92m[OKAY][0m

 [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

fused_adam fused_adam.............  .............[93m[NO][0m  [93m[NO][0m.......  fused_adamfused_adam.......[92m[OKAY][0m  [92m[OKAY][0m............. 

 .............[93m[NO][0m  [93m[NO][0mfused_lambfused_lamb.......   ............. ............. [92m[OKAY][0m.......  [93m[NO][0m
 [93m[NO][0m[92m[OKAY][0m....... 
 fused_lamb.......[92m[OKAY][0m  
[92m[OKAY][0mfused_lamb.............
  .............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
sparse_attn sparse_attn............  ............[93m[NO][0m  [93m[NO][0m.......  sparse_attn.......sparse_attn[92m[OKAY][0m   
............[92m[OKAY][0m............
  transformer[93m[NO][0m[93m[NO][0m   transformer...................  ...................  [93m[NO][0m[92m[OKAY][0m  [92m[OKAY][0m
[93m[NO][0m
....... transformer transformer [92m[OKAY][0m.......
............   ............[92m[OKAY][0m[93m[NO][0m 
stochastic_transformer [93m[NO][0m ....... stochastic_transformer ........ [92m[OKAY][0m  .[93m[NO][0m
  [92m[OKAY][0m[93m[NO][0m
 .......stochastic_transformer ....... [92m[OKAY][0m 
[92m[OKAY][0mstochastic_transformer.
  [93m[NO][0m ........  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
--------------------------------------------------
----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report


----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja


--------------------------------------------------JIT compiled ops requires ninja--------------------------------------------------


JIT compiled ops requires ninjaJIT compiled ops requires ninja

----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report
--------------------------------------------------

--------------------------------------------------DeepSpeed C++/CUDA extension op report
--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report


--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------
--------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------JIT compiled ops requires ninja


----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninja
JIT compiled ops requires ninja
JIT compiled ops requires ninja

----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
------------------------------------------------------------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report


--------------------------------------------------JIT compiled ops requires ninja
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

--------------------------------------------------JIT compiled ops requires ninja

JIT compiled ops requires ninja
ninjaninjaninjaninja    ...................................................... ..................  [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------

op nameop nameop name  op name................ ................  ................ ................installed  installedinstalled installed  .... ..  ..compatiblecompatible  

compatiblecompatible----------------------------------------------------------------------------------------------------


----------------------------------------------------------------------------------------------------

cpu_adam ...............cpu_adam cpu_adamcpu_adam[92m[YES][0m  ...............  .............................. ......   [92m[YES][0m[92m[YES][0m[92m[OKAY][0m [92m[YES][0m...... 
 ...... [92m[OKAY][0m ......
[92m[OKAY][0m 
[92m[OKAY][0m
fused_adam ............. [93m[NO][0mfused_adam  fused_adam....................  fused_adam .............[93m[NO][0m[92m[OKAY][0m  
 .............[93m[NO][0m.......  [93m[NO][0m fused_lamb.......[92m[OKAY][0m  
............. ....... [92m[OKAY][0m [93m[NO][0m
fused_lamb [92m[OKAY][0mfused_lamb .............
 .......  [93m[NO][0m.............[92m[OKAY][0m  
.......fused_lamb[93m[NO][0m   [92m[OKAY][0m....................
  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
sparse_attnsparse_attntransformer   sparse_attn....................................    [93m[NO][0m[93m[NO][0m............ [93m[NO][0m  .......[93m[NO][0m.......    .......[92m[OKAY][0m[92m[OKAY][0m....... 

 [92m[OKAY][0m[92m[OKAY][0m

transformerstochastic_transformer transformer transformer............   .........................[93m[NO][0m  [93m[NO][0m  [93m[NO][0m .......[93m[NO][0m .......  [92m[OKAY][0m..............
   [92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m

stochastic_transformer stochastic_transformer stochastic_transformer.  ..[93m[NO][0m   [93m[NO][0m[93m[NO][0m.......   ..............[92m[OKAY][0m  
[92m[OKAY][0m[92m[OKAY][0m

ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report

JIT compiled ops requires ninja
----------------------------------------------------------------------------------------------------
DeepSpeed C++/CUDA extension op report


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------
DeepSpeed C++/CUDA extension op report

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------


--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja


JIT compiled ops requires ninja--------------------------------------------------

JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja  ....................................    ....................................[92m[OKAY][0m[92m[OKAY][0m  

[92m[OKAY][0m--------------------------------------------------[92m[OKAY][0m--------------------------------------------------


op name --------------------------------------------------................
--------------------------------------------------op name op nameinstalled 
  .................................. op name  installedcompatible  installed
..................--------------------------------------------------   
installed..compatible  
..compatible --------------------------------------------------
compatible

--------------------------------------------------cpu_adam
-------------------------------------------------- 
...............cpu_adam  [92m[YES][0m...............  ......[92m[YES][0m cpu_adam [92m[OKAY][0mcpu_adam ......
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report


------------------------------------------------------------------------------------------------------------------------------------------------------


--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
  ...............[92m[OKAY][0m............... 
 [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

      meet the required dependencies to JIT install the op.
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------


--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------JIT compiled ops requires ninja


JIT compiled ops requires ninja--------------------------------------------------JIT compiled ops requires ninja


JIT compiled ops requires ninja
fused_adam ............. [93m[NO][0m fused_adam.......  .............[92m[OKAY][0m 
[93m[NO][0mfused_adamfused_adam  fused_lamb ....... ..........................  .............  [92m[OKAY][0m[93m[NO][0m[93m[NO][0m[93m[NO][0m
   ..............fused_lamb ....... [92m[OKAY][0m 
[92m[OKAY][0m 
.............[92m[OKAY][0m [93m[NO][0m
fused_lamb  .................... fused_lamb[92m[OKAY][0m 
 .............[93m[NO][0msparse_attn   [93m[NO][0m...................   .......[93m[NO][0m[92m[OKAY][0m  .......[92m[OKAY][0m
 
[92m[OKAY][0msparse_attn
 ............ [93m[NO][0m transformer.......  ............[92m[OKAY][0m 
[93m[NO][0m sparse_attn.......sparse_attn transformer[92m[OKAY][0m  ............ 
........................   [93m[NO][0m[93m[NO][0mstochastic_transformer[93m[NO][0m    ............... .......   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[93m[NO][0m

 
.......stochastic_transformer transformer [92m[OKAY][0m transformer
.............   ............[93m[NO][0m[93m[NO][0m   [93m[NO][0m..............   .......[92m[OKAY][0m[92m[OKAY][0m 

[92m[OKAY][0m
stochastic_transformer stochastic_transformer . [93m[NO][0m.  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op report

--------------------------------------------------
DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
----------------------------------------------------------------------------------------------------


DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

--------------------------------------------------JIT compiled ops requires ninja--------------------------------------------------


JIT compiled ops requires ninja
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


--------------------------------------------------

------------------------------------------------------------------------------------------------------------------------------------------------------

op name
op name op nameop name................  ................  ................ ................installed installed  ..installed  installed ..compatible  
....--------------------------------------------------compatible 
 compatible
compatible--------------------------------------------------


----------------------------------------------------------------------------------------------------

cpu_adam ............... [92m[YES][0m cpu_adam...... cpu_adam............... cpu_adam[92m[OKAY][0m   
[92m[YES][0m..............................   ......[92m[YES][0m[92m[YES][0m   ......[92m[OKAY][0m......
fused_adam   [92m[OKAY][0m[92m[OKAY][0m.............

 [93m[NO][0m ....... [92m[OKAY][0m
fused_adam .............fused_lamb  [93m[NO][0mfused_adam............. fused_adam  .......[93m[NO][0m  ............. .................... [92m[OKAY][0m [93m[NO][0m [93m[NO][0m
[92m[OKAY][0m  
..............fused_lamb   [92m[OKAY][0m[92m[OKAY][0m.............

 [93m[NO][0m fused_lamb.......fused_lambsparse_attn    .............[92m[OKAY][0m......................... 
  [93m[NO][0m[93m[NO][0m[93m[NO][0m   .....................   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


sparse_attn ............transformer  [93m[NO][0m............  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
sparse_attnsparse_attn  transformerstochastic_transformer............  ........................ [93m[NO][0m  . [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m .......[92m[OKAY][0m [92m[OKAY][0m 
[92m[OKAY][0m.......

 [92m[OKAY][0mstochastic_transformer
transformer transformer  .........................   [93m[NO][0m[93m[NO][0m[93m[NO][0m  ....... [92m[OKAY][0m
ninjaninjaninjaninja  ..................   ....................................[92m[OKAY][0m..................  
 ..............  stochastic_transformer[92m[OKAY][0m[92m[OKAY][0m 

 [92m[OKAY][0m[92m[OKAY][0m--------------------------------------------------[92m[OKAY][0m


--------------------------------------------------op name--------------------------------------------------
-------------------------------------------------- op name

. [93m[NO][0mstochastic_transformer  ....... .[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m
................  ................op nameop nameinstalled ................  installed  .................. installed ..  compatible installed..
compatible 
 --------------------------------------------------compatible--------------------------------------------------..


 --------------------------------------------------compatible

--------------------------------------------------
cpu_adamcpu_adam  ..............................  cpu_adam[92m[YES][0m[92m[YES][0m  cpu_adam .....................  ...... ............... [92m[OKAY][0m[92m[YES][0m [92m[OKAY][0m
 
......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
fused_adamfused_adam  ..........................  [93m[NO][0m[93m[NO][0m  fused_adam.............. fused_adam.............    [92m[OKAY][0m[92m[OKAY][0m.............[93m[NO][0m

  .......[93m[NO][0mfused_lamb   [92m[OKAY][0mfused_lamb....................
  [93m[NO][0m.............   fused_lamb[92m[OKAY][0m.......[93m[NO][0m   [92m[OKAY][0m.............

.......  [93m[NO][0m[92m[OKAY][0m 
.......fused_lamb  [92m[OKAY][0m
ninjaninjaninjaninja    ...................................................... ..................   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


............. [93m[NO][0m ....... [92m[OKAY][0m
------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------
op nameop name 
sparse_attn ............sparse_attn  [93m[NO][0m............  .......[93m[NO][0m sparse_attn [92m[OKAY][0m .......
............  [92m[OKAY][0m[93m[NO][0mtransformer
op name ................................  op name................ installed   installed..installed ................  .. compatible.. installed
compatible  --------------------------------------------------
  sparse_attn...................transformer    [92m[OKAY][0m............[93m[NO][0m............
   .......[93m[NO][0m [93m[NO][0m transformer[92m[OKAY][0m ....... 
..compatible
-------------------------------------------------- 

compatible--------------------------------------------------

--------------------------------------------------
....... ............ [92m[OKAY][0m stochastic_transformer
[92m[OKAY][0m[93m[NO][0m 
cpu_adam cpu_adam...............  ...............cpu_adam[92m[YES][0m   [92m[YES][0m......cpu_adam...............   [92m[OKAY][0m ......[92m[YES][0m
 ........ stochastic_transformer transformer [93m[NO][0m[92m[OKAY][0m  
....................  stochastic_transformer [93m[NO][0m[93m[NO][0m [92m[OKAY][0m .......
...............  [92m[OKAY][0m ......
  .[92m[OKAY][0m .......
[93m[NO][0m  [92m[OKAY][0m....... 
[92m[OKAY][0m
[92m[YES][0m [92m[OKAY][0m 
...... [92m[OKAY][0mfused_adam
 ............. [93m[NO][0mfused_adam  .................... fused_adam [92m[OKAY][0m [93m[NO][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
.............  .......[93m[NO][0mfused_lamb fused_adam  [92m[OKAY][0m.............
.......   [93m[NO][0m.............[92m[OKAY][0m fused_lamb.......  
 .............[93m[NO][0mfused_lamb[92m[OKAY][0m  
[93m[NO][0m ....................   .......[93m[NO][0m[92m[OKAY][0m  [92m[OKAY][0m
.......
 [92m[OKAY][0m
sparse_attnfused_lamb  .........................  [93m[NO][0m [93m[NO][0m.......  .......sparse_attnsparse_attn[92m[OKAY][0m   
........................[92m[OKAY][0m  [93m[NO][0m[93m[NO][0m
 transformer ....... ....... ............  [92m[OKAY][0m[92m[OKAY][0m[93m[NO][0m

 ....... transformer[92m[OKAY][0m transformer
............  sparse_attn............[93m[NO][0mstochastic_transformer    .......[93m[NO][0m ............[92m[OKAY][0m.
   .......[93m[NO][0m [93m[NO][0m [92m[OKAY][0mstochastic_transformer ....... 
 ........[92m[OKAY][0m  
stochastic_transformer[92m[OKAY][0m[93m[NO][0m  
........  [93m[NO][0m[92m[OKAY][0mtransformer  
...................  [92m[OKAY][0m
[93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
------------------------------------------------------------------------------------------------------------------------------------------------------


DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

--------------------------------------------------
----------------------------------------------------------------------------------------------------


DeepSpeed C++/CUDA extension op report--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------


JIT compiled ops requires ninja
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------
JIT compiled ops requires ninja


JIT compiled ops requires ninja--------------------------------------------------

JIT compiled ops requires ninja
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


----------------------------------------------------------------------------------------------------
--------------------------------------------------
--------------------------------------------------op name

op name op nameop name ................   ................................installed ................  installed installedinstalled..    ..compatible 
....compatible-------------------------------------------------- 
 
compatiblecompatible
--------------------------------------------------

----------------------------------------------------------------------------------------------------

cpu_adam ............... [92m[YES][0mcpu_adam  ...............cpu_adam......  cpu_adam ............... [92m[YES][0m[92m[OKAY][0m ............... 
[92m[YES][0m ...... [92m[YES][0m ...... [92m[OKAY][0m ......
[92m[OKAY][0m 
[92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0mfused_adamfused_adam
fused_adam   .............fused_lamb..........................    [93m[NO][0m.............[93m[NO][0m [93m[NO][0m  ....... .......[93m[NO][0m.......    [92m[OKAY][0m.......[92m[OKAY][0m[92m[OKAY][0m
 

[92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
fused_lamb fused_lambfused_lamb.............   .............[93m[NO][0m.............  [93m[NO][0m ....... [93m[NO][0m ....... [92m[OKAY][0msparse_attn
  .......[92m[OKAY][0m............  
[92m[OKAY][0m[93m[NO][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
 ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformersparse_attn  ........................ sparse_attn [93m[NO][0m[93m[NO][0msparse_attn    ......................................    [92m[OKAY][0m[92m[OKAY][0m[93m[NO][0m[93m[NO][0m

  ..............  transformer[92m[OKAY][0m[92m[OKAY][0mstochastic_transformer 
 
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
............. transformer[93m[NO][0m   transformer[93m[NO][0m...................    ................... [92m[OKAY][0m[93m[NO][0m
 [92m[OKAY][0m [93m[NO][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
.......stochastic_transformer   .......[92m[OKAY][0m.
  [92m[OKAY][0m[93m[NO][0m 
stochastic_transformer.......  [92m[OKAY][0mstochastic_transformer.
  [93m[NO][0m ........  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utils .................. utils[92m[YES][0m  ........................  [92m[YES][0m[92m[OKAY][0m 
...... [92m[OKAY][0m
quantizer ..............quantizer  [93m[NO][0m..............  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
ninjaninjaninjaninja    ...................................................... ..................   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------

op name
op name op name op name ................................ ................ ................  installedinstalled   installed..installed..    compatible....
compatible  
--------------------------------------------------compatiblecompatible
--------------------------------------------------


----------------------------------------------------------------------------------------------------

cpu_adam ............... cpu_adamcpu_adam[92m[YES][0m  cpu_adam............... ...............  ......[92m[YES][0m  ............... [92m[OKAY][0m[92m[YES][0m 
...... [92m[YES][0m......   [92m[OKAY][0m......[92m[OKAY][0m
 
[92m[OKAY][0m
fused_adam ............. [93m[NO][0m fused_adamfused_adam.......fused_adam    .............[92m[OKAY][0m..........................
   [93m[NO][0m[93m[NO][0m[93m[NO][0mfused_lamb    ..................................   [93m[NO][0m[92m[OKAY][0m[92m[OKAY][0m  

[92m[OKAY][0m.......
 fused_lamb[92m[OKAY][0mfused_lambfused_lamb
   .......................................  [93m[NO][0m [93m[NO][0m [93m[NO][0m ....... ....... ....... [92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m

sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
sparse_attnsparse_attn transformer sparse_attn............   ........................ ............[93m[NO][0m  [93m[NO][0m [93m[NO][0m[93m[NO][0m.......    .....................[92m[OKAY][0m   
[92m[OKAY][0m[92m[OKAY][0m

[92m[OKAY][0mtransformer
 transformer............  ............[93m[NO][0m stochastic_transformertransformer [93m[NO][0m  ....... ............ ........ [92m[OKAY][0m  
[93m[NO][0m[93m[NO][0m[92m[OKAY][0m  
.............. stochastic_transformer [92m[OKAY][0m [92m[OKAY][0m
stochastic_transformer
.  [93m[NO][0m. stochastic_transformer.......   [93m[NO][0m[92m[OKAY][0m ........
  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
async_io ............... [93m[NO][0m ....... [93m[NO][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
----------------------------------------------------------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

DeepSpeed C++/CUDA extension op report--------------------------------------------------

----------------------------------------------------------------------------------------------------JIT compiled ops requires ninja


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report

transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninja
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m

--------------------------------------------------

------------------------------------------------------------------------------------------------------------------------------------------------------
op name

 op name................op name op name ................ installed   ................................installed ..  installed installed ..compatible .. compatible 
..--------------------------------------------------
 
compatible--------------------------------------------------compatible


--------------------------------------------------
--------------------------------------------------
cpu_adamcpu_adam  ...............cpu_adamcpu_adam...............  [92m[YES][0m  .............................. [92m[YES][0m   ......[92m[YES][0m......[92m[YES][0m  [92m[OKAY][0m  
[92m[OKAY][0m............
  [92m[OKAY][0m[92m[OKAY][0m

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
fused_adam .............fused_adam  [93m[NO][0m.............fused_adamfused_adam    [93m[NO][0m....................  [93m[NO][0m.............   .......[92m[OKAY][0m[93m[NO][0m.......  
 [92m[OKAY][0m.......[92m[OKAY][0m
fused_lamb
async_io ............... [93m[NO][0m ....... [93m[NO][0m
  fused_lamb[92m[OKAY][0m.............fused_lamb  
 .............[93m[NO][0m.............  [93m[NO][0m fused_lamb .......[93m[NO][0m.......    [92m[OKAY][0m.............[92m[OKAY][0m....... 

 [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
sparse_attn sparse_attn............  sparse_attn............[93m[NO][0m   [93m[NO][0m...................sparse_attn    .......[93m[NO][0m[92m[OKAY][0m............   [93m[NO][0m
[92m[OKAY][0m....... 
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
....... transformer [92m[OKAY][0m [92m[OKAY][0m............

--------------------------------------------------
transformer  [93m[NO][0m............ transformertransformer ....... [93m[NO][0m  ............ ............[92m[OKAY][0m ....... 
[93m[NO][0m [93m[NO][0m [92m[OKAY][0mstochastic_transformer
 .......  ........[92m[OKAY][0m stochastic_transformer 
[92m[OKAY][0m [93m[NO][0m
 ........  stochastic_transformer[92m[OKAY][0m[93m[NO][0mstochastic_transformer
   .........   [92m[OKAY][0m[93m[NO][0m
[93m[NO][0m  .............. [92m[OKAY][0m 
[92m[OKAY][0m
ninjaninjaninjaninja    ......................................................  ..................  [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


----------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------
op name
 
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

op name................op name   op name................installed................    ..installed................installed    ..compatibleinstalled..  
 compatible..compatible-------------------------------------------------- 


async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

compatible--------------------------------------------------
--------------------------------------------------
--------------------------------------------------

transformer_inference transformer_inference..  ..[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... cpu_adamcpu_adam[92m[OKAY][0m 
utils utils..................  ..................[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
cpu_adam ............... ............... ............... [92m[YES][0m [92m[YES][0m [92m[YES][0m ...... ...... ......fused_adam [92m[OKAY][0m  [92m[OKAY][0m
.............
[92m[OKAY][0m 
quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

[93m[NO][0m ....... [92m[OKAY][0m
----------------------------------------------------------------------------------------------------

fused_lamb fused_adam.............fused_adam   fused_adam[93m[NO][0m.............   .................................[93m[NO][0m    .......[93m[NO][0m[92m[OKAY][0m[93m[NO][0m
   ..............[92m[OKAY][0m  
[92m[OKAY][0m[92m[OKAY][0m

fused_lamb ............. fused_lambfused_lamb[93m[NO][0m sparse_attn  ............. ................................    [93m[NO][0m[93m[NO][0m[92m[OKAY][0m[93m[NO][0m  ....... 
 ..............[92m[OKAY][0m  
[92m[OKAY][0m[92m[OKAY][0m

transformer ............ sparse_attn[93m[NO][0m  ....... ............[92m[OKAY][0m sparse_attnsparse_attn
[93m[NO][0m   ............stochastic_transformer.......  ............[93m[NO][0m  . [93m[NO][0m [92m[OKAY][0m [93m[NO][0m....... .......
 .......  [92m[OKAY][0m[92m[OKAY][0mtransformer[92m[OKAY][0m

 
............transformer  transformer............[93m[NO][0m   ............[93m[NO][0m ....... [93m[NO][0m .......  [92m[OKAY][0m.......[92m[OKAY][0m

 [92m[OKAY][0m
stochastic_transformer stochastic_transformerstochastic_transformer.   [93m[NO][0m.  ........[93m[NO][0m   [93m[NO][0m.......[92m[OKAY][0m  
.......[92m[OKAY][0m 
[92m[OKAY][0m
ninjaninjaninja ninja  .................. ....................................  .................. [92m[OKAY][0m[92m[OKAY][0m [92m[OKAY][0m

[92m[OKAY][0m
----------------------------------------------------------------------------------------------------


--------------------------------------------------op name--------------------------------------------------op name

  op name................op name................   installed................  ................ installedinstalled .. ..   installed..compatible compatible
 ..
--------------------------------------------------compatible --------------------------------------------------

compatible

--------------------------------------------------
--------------------------------------------------
cpu_adam ............... [92m[YES][0mcpu_adamcpu_adam cpu_adam ...... ...............   .............................. [92m[OKAY][0m[92m[YES][0m [92m[YES][0m  
[92m[YES][0m............   [92m[OKAY][0m......[92m[OKAY][0m
 
[92m[OKAY][0m
fused_adam ............. fused_adam[93m[NO][0m  fused_adamfused_adam....................    ..........................[92m[OKAY][0m [93m[NO][0m  
[93m[NO][0m[93m[NO][0m.......   ..............[92m[OKAY][0mfused_lamb  
 [92m[OKAY][0m[92m[OKAY][0m.............

 fused_lamb[93m[NO][0m fused_lamb ............. ....... ............. fused_lamb[92m[OKAY][0m[93m[NO][0m  
[93m[NO][0m ............. ....... ....... [93m[NO][0m [92m[OKAY][0m [92m[OKAY][0m
.......
 [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn transformersparse_attn ............sparse_attn ............   [93m[NO][0m............ ............ [93m[NO][0m....... [93m[NO][0m  [93m[NO][0m[92m[OKAY][0m ....... 
 ..............[92m[OKAY][0m  
[92m[OKAY][0m[92m[OKAY][0mstochastic_transformer

transformer  ............transformer.  [93m[NO][0mtransformer   [93m[NO][0m........................ .......  .......[93m[NO][0m  [93m[NO][0m [92m[OKAY][0m[92m[OKAY][0m .......

 .......[92m[OKAY][0m
 [92m[OKAY][0mstochastic_transformer
 stochastic_transformer . [93m[NO][0mstochastic_transformer.   .......[93m[NO][0m.   [92m[OKAY][0m[93m[NO][0m.......
  .......[92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
----------------------------------------------------------------------------------------------------
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report

--------------------------------------------------
DeepSpeed C++/CUDA extension op report--------------------------------------------------
--------------------------------------------------


DeepSpeed C++/CUDA extension op report--------------------------------------------------JIT compiled ops requires ninja
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.----------------------------------------------------------------------------------------------------


--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja


JIT compiled ops requires ninja--------------------------------------------------

JIT compiled ops requires ninja
----------------------------------------------------------------------------------------------------

--------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report


--------------------------------------------------DeepSpeed C++/CUDA extension op report----------------------------------------------------------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.----------------------------------------------------------------------------------------------------
--------------------------------------------------
--------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninjaJIT compiled ops requires ninjaJIT compiled ops requires ninja


--------------------------------------------------
JIT compiled ops requires ninja
------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------

DeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report
----------------------------------------------------------------------------------------------------

--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


JIT compiled ops requires ninja--------------------------------------------------
JIT compiled ops requires ninja--------------------------------------------------


JIT compiled ops requires ninjaJIT compiled ops requires ninja

----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report

--------------------------------------------------

--------------------------------------------------DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.----------------------------------------------------------------------------------------------------


----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja


JIT compiled ops requires ninja--------------------------------------------------DeepSpeed C++/CUDA extension op report


JIT compiled ops requires ninja--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja   .................. .................. ....................................[92m[OKAY][0m   [92m[OKAY][0m
[92m[OKAY][0m[92m[OKAY][0m
--------------------------------------------------


------------------------------------------------------------------------------------------------------------------------------------------------------op name


 op nameop name ................op name................   installed................................    installed..installed installed  .. compatible.. ..
 compatible --------------------------------------------------compatible

compatible

----------------------------------------------------------------------------------------------------
--------------------------------------------------

cpu_adam ............... [92m[YES][0mcpu_adamcpu_adam  cpu_adam............... ......  ............... ...............[92m[YES][0m [92m[OKAY][0m [92m[YES][0m  ......
[92m[YES][0m......   [92m[OKAY][0m......[92m[OKAY][0m
 
[92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... fused_adam[92m[OKAY][0mfused_adam 
fused_adam.............   .............[93m[NO][0mfused_lamb.............    .......[93m[NO][0m.............[93m[NO][0m    [93m[NO][0m.......[92m[OKAY][0m .......
.......   fused_lamb[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m 


............. [93m[NO][0mfused_lambfused_lamb  ....... ............. ............. [92m[OKAY][0m 
[93m[NO][0m[93m[NO][0msparse_attn   ..........................   [92m[OKAY][0m[92m[OKAY][0m[93m[NO][0m
 
.......sparse_attn  [92m[OKAY][0m............
 [93m[NO][0m transformer.......  ............[92m[OKAY][0m 
sparse_attn[93m[NO][0msparse_attn  transformer  ...........................................   [92m[OKAY][0m [93m[NO][0m
[93m[NO][0m[93m[NO][0m   ..................... stochastic_transformer  [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m


. [93m[NO][0mtransformer stochastic_transformertransformer   ................................    [92m[OKAY][0m[93m[NO][0m[93m[NO][0m
  [93m[NO][0m.......  ..............[92m[OKAY][0m 
 [92m[OKAY][0m[92m[OKAY][0m

stochastic_transformerstochastic_transformer  ..  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


op nameop nameop name op name   ................................................................    installedinstalledinstalled installed   ........    compatiblecompatiblecompatible
compatible

--------------------------------------------------
----------------------------------------------------------------------------------------------------
--------------------------------------------------


cpu_adam ............... cpu_adamcpu_adam[92m[YES][0mcpu_adam    ....................................  [92m[OKAY][0m...............
  [92m[YES][0m[92m[YES][0m[92m[YES][0m   ..................   [92m[OKAY][0m[92m[OKAY][0mfused_adam[92m[OKAY][0m

 
............. [93m[NO][0m ....... [92m[OKAY][0m
fused_adamfused_lamb fused_adam fused_adam..........................  [93m[NO][0m   .................................  [93m[NO][0m [93m[NO][0m[93m[NO][0m[92m[OKAY][0m  ....... 
....... ....... [92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m

fused_lambfused_lambfused_lamb   .......................... sparse_attn.............  [93m[NO][0m [93m[NO][0m[93m[NO][0m ............   .............. .......[93m[NO][0m[92m[OKAY][0m  
 [92m[OKAY][0m[92m[OKAY][0m.......

 [92m[OKAY][0m
transformer ............ [93m[NO][0msparse_attn  ................... sparse_attn[93m[NO][0m  sparse_attn[92m[OKAY][0m ....... ............ 
............ [92m[OKAY][0m [93m[NO][0m
[93m[NO][0mstochastic_transformer   transformer .......................... .   [93m[NO][0m[92m[OKAY][0m[92m[OKAY][0m[93m[NO][0m  .......

.......  [92m[OKAY][0mtransformer[92m[OKAY][0m
transformer 
 ........................ stochastic_transformer [93m[NO][0m [93m[NO][0m  ...............   [93m[NO][0m[92m[OKAY][0m [92m[OKAY][0m
.......
 [92m[OKAY][0m
stochastic_transformer stochastic_transformer . . [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja


--------------------------------------------------DeepSpeed C++/CUDA extension op report

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------

DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja

--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja   .................. .................................... ..................  [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m


------------------------------------------------------------------------------------------------------------------------------------------------------


--------------------------------------------------op nameop name
 op name ................ ................op name ................  installed................  installed ..installed installed .. compatible  ....compatible
  
compatible--------------------------------------------------compatible

--------------------------------------------------
--------------------------------------------------
--------------------------------------------------

cpu_adam ............... cpu_adam[92m[YES][0m cpu_adam cpu_adam ..................... ...............  ............... [92m[OKAY][0m[92m[YES][0m [92m[YES][0m 
[92m[YES][0m ...... ...... ...... [92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m

fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0mfused_adam
fused_adam fused_adam ............. fused_lamb............. .............   .............[93m[NO][0m[93m[NO][0m[93m[NO][0m   .............. [93m[NO][0m   .......[92m[OKAY][0m.......[92m[OKAY][0m 
 
[92m[OKAY][0m[92m[OKAY][0m

fused_lamb fused_lamb.............fused_lamb   [93m[NO][0m..........................   .......[93m[NO][0m[93m[NO][0m   [92m[OKAY][0m.......sparse_attn.......
   [92m[OKAY][0m............[92m[OKAY][0m
 
[93m[NO][0m ....... [92m[OKAY][0m
transformer ............ sparse_attn[93m[NO][0m sparse_attn sparse_attn............ .......   ........................[93m[NO][0m[92m[OKAY][0m   
[93m[NO][0m.......[93m[NO][0m   [92m[OKAY][0m.......stochastic_transformer.......
   [92m[OKAY][0m[92m[OKAY][0mtransformer
.
  ............transformer[93m[NO][0m transformer  ...................  [93m[NO][0m............  [93m[NO][0m [92m[OKAY][0m....... [93m[NO][0m
 ....... [92m[OKAY][0m .......
[92m[OKAY][0m 
[92m[OKAY][0m
stochastic_transformer .stochastic_transformerstochastic_transformer   [93m[NO][0m .........   [93m[NO][0m[93m[NO][0m[92m[OKAY][0m  
..............  [92m[OKAY][0m[92m[OKAY][0m

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninjaninja  ....................................  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

op nameop name  ................................  installedinstalled  ....  compatiblecompatibleninja

---------------------------------------------------------------------------------------------------- 

..................ninja [92m[OKAY][0m 
.................. [92m[OKAY][0mcpu_adam--------------------------------------------------
 cpu_adam
............... -------------------------------------------------- op name...............
 [92m[YES][0m ................ op name[92m[YES][0m ...... installed  ......................[92m[OKAY][0m   
[92m[OKAY][0m..installed
  ..compatible 
compatible
--------------------------------------------------
--------------------------------------------------fused_adam
 ............. fused_adam[93m[NO][0m  ....................  [92m[OKAY][0m[93m[NO][0m
 cpu_adam.......  ...............cpu_adam[92m[OKAY][0mfused_lamb  [92m[YES][0m
  ............................ ...... [92m[YES][0mfused_lamb[93m[NO][0m    [92m[OKAY][0m.............
  .............[92m[OKAY][0m[92m[OKAY][0m 

[93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m .......fused_adam  [92m[OKAY][0m.............sparse_attn
  [93m[NO][0m............sparse_attn  fused_lamb ....... ............[93m[NO][0m .............  [92m[OKAY][0m .......[93m[NO][0m
[93m[NO][0m   [92m[OKAY][0m..............fused_lamb
   [92m[OKAY][0m.............[92m[OKAY][0mtransformer
 
[93m[NO][0m  ...................transformer   [93m[NO][0m[92m[OKAY][0m............ 
 .......[93m[NO][0m  [92m[OKAY][0m.......sparse_attn
  [92m[OKAY][0m............
 [93m[NO][0mstochastic_transformer  .......stochastic_transformer . [92m[OKAY][0msparse_attn 
 .[93m[NO][0m............  [93m[NO][0mtransformer.......    [93m[NO][0m...................[92m[OKAY][0m 
  .......[92m[OKAY][0m [93m[NO][0m
[92m[OKAY][0m 
....... [92m[OKAY][0mtransformer
 ............ [93m[NO][0m stochastic_transformer.......  [92m[OKAY][0m
. [93m[NO][0mstochastic_transformer .......  [92m[OKAY][0m.
 [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
--------------------------------------------------JIT compiled ops requires ninja--------------------------------------------------


DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninjaJIT compiled ops requires ninja--------------------------------------------------


DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
------------------------------------------------------------------------------------------------------------------------------------------------------


DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
DeepSpeed C++/CUDA extension op report

--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
JIT compiled ops requires ninja
JIT compiled ops requires ninja
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
async_io ............... [93m[NO][0m ....... [93m[NO][0m
      meet the required dependencies to JIT install the op.

--------------------------------------------------
JIT compiled ops requires ninja
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninjaninjaninjaninja   .................. ......................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------


op nameop nameop nameop name    ................................................................    installedinstalledinstalledinstalled   ......   .. compatiblecompatiblecompatible 


compatible----------------------------------------------------------------------------------------------------
--------------------------------------------------


--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninja
DeepSpeed C++/CUDA extension op report
----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
DeepSpeed C++/CUDA extension op report--------------------------------------------------

--------------------------------------------------JIT compiled ops requires ninja

NOTE: Ops not installed will be just-in-time (JIT) compiled at
cpu_adamcpu_adam cpu_adam ............... cpu_adam...............  ............... [92m[YES][0m [92m[YES][0m .....................[92m[YES][0m    ......[92m[YES][0m...... [92m[OKAY][0m  
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[92m[OKAY][0m[92m[OKAY][0m

...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0mfused_adam  .......fused_adam.............   [92m[OKAY][0m.............[93m[NO][0mfused_adam
   [93m[NO][0mfused_lamb....................   ....................  [92m[OKAY][0m [92m[OKAY][0m[93m[NO][0m

 [93m[NO][0m.......  fused_lamb[92m[OKAY][0m.......fused_lamb
   .............[92m[OKAY][0m............. 
 [93m[NO][0mfused_lamb[93m[NO][0m   ...........................   [93m[NO][0m[92m[OKAY][0m[92m[OKAY][0m

 sparse_attn.......  [92m[OKAY][0m............ [93m[NO][0m 
--------------------------------------------------
----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report


DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
....... sparse_attnsparse_attn[92m[OKAY][0m  
      meet the required dependencies to JIT install the op.----------------------------------------------------------------------------------------------------


--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


--------------------------------------------------DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja--------------------------------------------------


JIT compiled ops requires ninja--------------------------------------------------JIT compiled ops requires ninja


NOTE: Ops not installed will be just-in-time (JIT) compiled at
........................  [93m[NO][0mtransformer[93m[NO][0m   ..........................   [92m[OKAY][0m[92m[OKAY][0m[93m[NO][0m
sparse_attn
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
  ....... transformer[92m[OKAY][0m............transformer 
 ............ [93m[NO][0m stochastic_transformer[93m[NO][0m ............   .............. . [93m[NO][0m [92m[OKAY][0m [92m[OKAY][0m[93m[NO][0m.......
  .......[92m[OKAY][0m 

[92m[OKAY][0m
stochastic_transformer transformer .stochastic_transformer ............ [93m[NO][0m  [93m[NO][0m........  ....... [92m[OKAY][0m [93m[NO][0m
 ....... [92m[OKAY][0m[92m[OKAY][0m

stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
------------------------------------------------------------------------------------------------------------------------------------------------------


DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report


----------------------------------------------------------------------------------------------------
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.----------------------------------------------------------------------------------------------------


----------------------------------------------------------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja


JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report

--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
--------------------------------------------------
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninjaninjaninjaninja   .................. .................. .................................... [92m[OKAY][0m  [92m[OKAY][0m
[92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------op name

 op name................ op nameop name ................   installed................................installed  ..  installed ..installedcompatible   
compatible
....-------------------------------------------------- -------------------------------------------------- 
compatible
compatible

--------------------------------------------------
--------------------------------------------------
cpu_adamcpu_adam  ..............................  cpu_adam[92m[YES][0m cpu_adam[92m[YES][0m...............    ..................... ...... [92m[YES][0m[92m[OKAY][0m [92m[YES][0m
 ......[92m[OKAY][0m  
......[92m[OKAY][0m
 [92m[OKAY][0m
fused_adam ............. [93m[NO][0mfused_adam  .................... fused_adamfused_adam  [92m[OKAY][0m [93m[NO][0m .............
....................   [93m[NO][0m[93m[NO][0m[92m[OKAY][0mfused_lamb  
 ...........................   fused_lamb[93m[NO][0m [92m[OKAY][0m[92m[OKAY][0m

 ....................  [93m[NO][0mfused_lambfused_lamb  [92m[OKAY][0m ....................
.............  [92m[OKAY][0m [93m[NO][0m
[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

sparse_attn ............sparse_attn  ............[93m[NO][0msparse_attnsparse_attn   [93m[NO][0m...................   .......  [92m[OKAY][0m............[92m[OKAY][0m
[93m[NO][0m 
 [93m[NO][0mtransformer transformer....... .......  ............ ............[92m[OKAY][0m  [92m[OKAY][0m[93m[NO][0m

[93m[NO][0m  .......transformer ....... transformer [92m[OKAY][0m............
 [92m[OKAY][0m ............
[93m[NO][0mstochastic_transformer   [93m[NO][0m.......stochastic_transformer   .........  [93m[NO][0m[92m[OKAY][0m [92m[OKAY][0m
 [93m[NO][0m
.......  .......stochastic_transformer[92m[OKAY][0m  
stochastic_transformer[92m[OKAY][0m
 . [93m[NO][0m.  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
ninjaninjaninjaninja    ........................................................................   [92m[OKAY][0m[92m[OKAY][0m 
[92m[OKAY][0m
[92m[OKAY][0m--------------------------------------------------
--------------------------------------------------


--------------------------------------------------op name-------------------------------------------------- 
op name
................ op name ................  op nameinstalled................installed    ..................installed..    compatibleinstalledcompatible
.. 
 ..----------------------------------------------------------------------------------------------------compatible

 
compatible--------------------------------------------------

--------------------------------------------------
cpu_adam cpu_adam...............  ...............[92m[YES][0mcpu_adam  cpu_adam [92m[YES][0m......   .....................[92m[OKAY][0m............... 
  [92m[OKAY][0m[92m[YES][0m[92m[YES][0m
  ............  [92m[OKAY][0m[92m[OKAY][0m

fused_adam .............fused_adam  [93m[NO][0m.............  .......[93m[NO][0m fused_adam [92m[OKAY][0mfused_adam .......
 ............. ............. [92m[OKAY][0mfused_lamb [93m[NO][0m
 [93m[NO][0m ............. ....... .......fused_lamb   [93m[NO][0m[92m[OKAY][0m............. [92m[OKAY][0m
 .......
[93m[NO][0m  [92m[OKAY][0m.......
fused_lamb fused_lamb [92m[OKAY][0m 
..........................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

sparse_attn ............ [93m[NO][0m sparse_attn.......  ............[92m[OKAY][0m 
[93m[NO][0m .......transformer sparse_attn[92m[OKAY][0msparse_attn  ............
 ............ ............transformer [93m[NO][0m   [93m[NO][0m[93m[NO][0m............ ....... ..............   [92m[OKAY][0m [93m[NO][0m
[92m[OKAY][0m [92m[OKAY][0m.......

stochastic_transformer  [92m[OKAY][0mtransformertransformer
.   ........................[93m[NO][0m stochastic_transformer  [93m[NO][0m [93m[NO][0m.......  ........ .......  [92m[OKAY][0m[92m[OKAY][0m [93m[NO][0m

 [92m[OKAY][0m.......
 stochastic_transformer[92m[OKAY][0m 
stochastic_transformer.  [93m[NO][0m.  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------


op nameop nameop name op name   ................................................................   installedinstalled installed  installed  ........    compatiblecompatiblecompatiblecompatible


----------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------


cpu_adam cpu_adamcpu_adam...............   ..............................cpu_adam  [92m[YES][0m ...............  [92m[YES][0m[92m[YES][0m......[92m[YES][0m  ......  ...... ......[92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m 
[92m[OKAY][0m


fused_adamfused_adam  fused_adam..........................fused_adam   .............[93m[NO][0m [93m[NO][0m  ............. [93m[NO][0m ....... ..............  [93m[NO][0m [92m[OKAY][0m[92m[OKAY][0m 
[92m[OKAY][0m.......

 [92m[OKAY][0mfused_lambfused_lamb
fused_lamb   ..........................fused_lamb  .............[93m[NO][0m .............  [93m[NO][0m .......[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m [92m[OKAY][0m 

.......[92m[OKAY][0m 
[92m[OKAY][0m
sparse_attn ............sparse_attn sparse_attn [93m[NO][0m ............sparse_attn  ............  .......[93m[NO][0m ............[93m[NO][0m[92m[OKAY][0m   
.......[93m[NO][0m.......  transformer [92m[OKAY][0m....... [92m[OKAY][0m 
............
[92m[OKAY][0m 
[93m[NO][0m transformertransformertransformer  ....... ........................  ............[92m[OKAY][0m[93m[NO][0m   [93m[NO][0m
[93m[NO][0m ....... ....... ....... [92m[OKAY][0mstochastic_transformer 
[92m[OKAY][0m [92m[OKAY][0m

. stochastic_transformer[93m[NO][0m  stochastic_transformerstochastic_transformer........  [92m[OKAY][0m  
.[93m[NO][0m.   [93m[NO][0m.......[93m[NO][0m .......   [92m[OKAY][0m[92m[OKAY][0m.......

 [92m[OKAY][0m
ninjaninjaninjaninja    ........................................................................   [92m[OKAY][0m[92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m
--------------------------------------------------


----------------------------------------------------------------------------------------------------
--------------------------------------------------op name

----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

------------------------------------------------------------------------------------------------------------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

op name op nameop name ................ ................ ................   ................installedinstalledinstalled    ..installed....   compatible compatible..

compatible --------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


JIT compiled ops requires ninja
JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


DeepSpeed C++/CUDA extension op report
--------------------------------------------------
--------------------------------------------------JIT compiled ops requires ninja

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
compatible--------------------------------------------------


--------------------------------------------------
cpu_adam ...............cpu_adam cpu_adamcpu_adam   ...............[92m[YES][0m...............   .....................[92m[YES][0m[92m[YES][0m    [92m[OKAY][0m......[92m[YES][0m ......
 [92m[OKAY][0m ......
[92m[OKAY][0m 
[92m[OKAY][0m
fused_adam ............. fused_adam[93m[NO][0m fused_adam.............fused_adam   .............[93m[NO][0m .......  ............. [93m[NO][0m....... [92m[OKAY][0m  [93m[NO][0m
[92m[OKAY][0m....... 
 .......fused_lamb[92m[OKAY][0m  
fused_lamb.............[92m[OKAY][0m  
fused_lamb[93m[NO][0m.............   fused_lamb.......[93m[NO][0m ............. .............[92m[OKAY][0m   
.......[93m[NO][0m[93m[NO][0m   [92m[OKAY][0m..............
  [92m[OKAY][0m[92m[OKAY][0m

sparse_attn ............ [93m[NO][0m ....... sparse_attn[92m[OKAY][0m 
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
............sparse_attnsparse_attn transformer  ............[93m[NO][0m  [93m[NO][0m............ ............ .......  [93m[NO][0m....... [93m[NO][0m  [92m[OKAY][0m[92m[OKAY][0m 
.......
async_io[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. ...............
ninjaninjaninjaninja    ........................................................................   [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m

--------------------------------------------------

.......  [92m[OKAY][0mstochastic_transformertransformer[92m[OKAY][0m 

 [93m[NO][0m ....... [93m[NO][0m
--------------------------------------------------
----------------------------------------------------------------------------------------------------op name
op name 
 op name................................op name    ................installed................installed    installed..installed..    .... compatiblecompatible 
compatible
compatible--------------------------------------------------
 ............ .transformertransformer[93m[NO][0m    ............[93m[NO][0m...................    .......[93m[NO][0m[93m[NO][0m[92m[OKAY][0m 
  [92m[OKAY][0m..............
  [92m[OKAY][0mstochastic_transformer[92m[OKAY][0m
transformer_inferenceasync_io  .................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[93m[NO][0m

--------------------------------------------------


----------------------------------------------------------------------------------------------------

 
utils .................. [92m[YES][0m ...... [92m[OKAY][0mtransformer_inference
cpu_adamcpu_adam  cpu_adamcpu_adam...............  ............... ............... [92m[YES][0m...............  [92m[YES][0m [92m[YES][0m...... [92m[YES][0m  [92m[OKAY][0m ............
......   [92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m
.stochastic_transformer  stochastic_transformer[93m[NO][0m.  .......  .[93m[NO][0m[92m[OKAY][0m  
[93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
 .. [93m[NO][0mquantizer  .....................  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m

fused_adam ............. [93m[NO][0m fused_adam.......  .............fused_adam[92m[OKAY][0mfused_adam  
utils --------------------------------------------------..................
[93m[NO][0m ............. ............. ....... fused_lamb[93m[NO][0m  [93m[NO][0m............. [92m[OKAY][0m.......
  ....... [93m[NO][0m[92m[OKAY][0m 
 fused_lamb.......[92m[OKAY][0m  
 [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb[92m[OKAY][0m.............  
--------------------------------------------------
fused_lamb.............[93m[NO][0m   .............[93m[NO][0m.......   [93m[NO][0m.......[92m[OKAY][0m  
[92m[OKAY][0m.......
sparse_attn  [92m[OKAY][0m............
 [93m[NO][0m ....... [92m[OKAY][0m
sparse_attnsparse_attn transformer ............ ............ ............sparse_attn   [93m[NO][0m[93m[NO][0m[93m[NO][0m ............  .....................    [93m[NO][0m[92m[OKAY][0m[92m[OKAY][0m 
[92m[OKAY][0m
.......
transformer  [92m[OKAY][0m............transformer
  [93m[NO][0mstochastic_transformer............transformer    [93m[NO][0m........ ............   .......[93m[NO][0m[92m[OKAY][0m[93m[NO][0m  
....... [92m[OKAY][0m .......
[92m[OKAY][0mstochastic_transformer 
 [92m[OKAY][0m
stochastic_transformer.  stochastic_transformer[93m[NO][0m.   ........[93m[NO][0m   [92m[OKAY][0m[93m[NO][0m.......
  .......[92m[OKAY][0m 
[92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m async_io.......  ...............[93m[NO][0masync_io 
 [93m[NO][0m ......................  [93m[NO][0m[93m[NO][0m 
....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. transformer_inference[93m[NO][0m  utils.........   [93m[NO][0m[92m[OKAY][0m.................. 
 .......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
utils .................. utilsquantizer[92m[YES][0m   ......................................   [92m[YES][0m[93m[NO][0m[92m[OKAY][0m  
.............  [92m[OKAY][0m[92m[OKAY][0m

quantizer .............. quantizer[93m[NO][0m -------------------------------------------------- ..............
.......  [93m[NO][0m[92m[OKAY][0m
 ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
----------------------------------------------------------------------------------------------------
DeepSpeed C++/CUDA extension op report

--------------------------------------------------DeepSpeed C++/CUDA extension op report--------------------------------------------------
--------------------------------------------------


--------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.------------------------------------------------------------------------------------------------------------------------------------------------------


JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


JIT compiled ops requires ninja--------------------------------------------------
--------------------------------------------------

JIT compiled ops requires ninjaJIT compiled ops requires ninja

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


op nameop nameop name op name  ................................ ................  ................ installedinstalledinstalled    ......installed   compatiblecompatible 
compatible
..
------------------------------------------------------------------------------------------------------------------------------------------------------ 


compatible
--------------------------------------------------
cpu_adamcpu_adam  cpu_adam..............................cpu_adam    [92m[YES][0m..............................[92m[YES][0m   ...... [92m[YES][0m...... [92m[YES][0m   [92m[OKAY][0m......[92m[OKAY][0m......
 
 [92m[OKAY][0m[92m[OKAY][0m

fused_adam .............fused_adam  [93m[NO][0mfused_adamfused_adam.............   .......[93m[NO][0m .............  .............  [92m[OKAY][0m[93m[NO][0m.......[93m[NO][0m
   [92m[OKAY][0m..............
 fused_lamb[92m[OKAY][0m  
fused_lamb[92m[OKAY][0m.............  
.............fused_lamb[93m[NO][0m  [93m[NO][0m .............fused_lamb  ....... .......[93m[NO][0m.............    [92m[OKAY][0m[93m[NO][0m[92m[OKAY][0m.......

  .......[92m[OKAY][0m 
[92m[OKAY][0m
sparse_attnsparse_attn  sparse_attn........................sparse_attn    [93m[NO][0m[93m[NO][0m........................   ....... ....... [93m[NO][0m[93m[NO][0m  [92m[OKAY][0m[92m[OKAY][0m .......
.......
  transformer[92m[OKAY][0m[92m[OKAY][0m transformer
 
........................transformertransformer    [93m[NO][0m[93m[NO][0m  ......................................    [93m[NO][0m[92m[OKAY][0m[92m[OKAY][0m[93m[NO][0m
 
 ..............  [92m[OKAY][0mstochastic_transformer[92m[OKAY][0m
stochastic_transformer  
.stochastic_transformer.   [93m[NO][0mstochastic_transformer[93m[NO][0m .  ....... ....... .[93m[NO][0m [92m[OKAY][0m[92m[OKAY][0m 
 
[93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op report

--------------------------------------------------
DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

------------------------------------------------------------------------------------------------------------------------------------------------------


--------------------------------------------------JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


--------------------------------------------------DeepSpeed C++/CUDA extension op report--------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------JIT compiled ops requires ninja


--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja
--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------


op nameop nameop name op name  ................................   ................installedinstalled  ................ .. ..installed installed  compatible compatible..
..
  --------------------------------------------------compatiblecompatible--------------------------------------------------


----------------------------------------------------------------------------------------------------

cpu_adam ...............cpu_adam [92m[YES][0m cpu_adamcpu_adam ............... ......  ............... ...............[92m[YES][0m [92m[OKAY][0m  [92m[YES][0m[92m[YES][0m
......   ......[92m[OKAY][0m...... 
 [92m[OKAY][0mfused_adam
[92m[OKAY][0m .............
 [93m[NO][0m ....... [92m[OKAY][0m
fused_adam fused_adam.............fused_lamb   ..........................fused_adam[93m[NO][0m    [93m[NO][0m.............[93m[NO][0m.......    .......[93m[NO][0m....... [92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m
 
.......fused_lamb  [92m[OKAY][0m
fused_lamb.............  .............[93m[NO][0mfused_lamb   [93m[NO][0msparse_attn....................    ...................[93m[NO][0m [92m[OKAY][0m  [93m[NO][0m
[92m[OKAY][0m....... 
 .......[92m[OKAY][0m [92m[OKAY][0m

transformer sparse_attn............  sparse_attn............ [93m[NO][0m............   sparse_attn.......[93m[NO][0m[93m[NO][0m    [92m[OKAY][0m..............
............   [92m[OKAY][0m[92m[OKAY][0m[93m[NO][0mstochastic_transformer

  ........ transformertransformer[92m[OKAY][0m   
............[93m[NO][0m............  transformer .......[93m[NO][0m [93m[NO][0m  ............ [92m[OKAY][0m.............. 
  [93m[NO][0m[92m[OKAY][0m[92m[OKAY][0m 

....... [92m[OKAY][0m
stochastic_transformerstochastic_transformer  ..stochastic_transformer   [93m[NO][0m[93m[NO][0m  ...............   [93m[NO][0m[92m[OKAY][0m 
[92m[OKAY][0m.......
 [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

--------------------------------------------------
async_io ...............async_io [93m[NO][0m  ......................  [93m[NO][0m[93m[NO][0m 
....... [93m[NO][0m
transformer_inference transformer_inference..  ..[93m[NO][0m  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

ninjaninjaninja  ninja....................................    ..................[92m[OKAY][0m[92m[OKAY][0m.................. 

 [92m[OKAY][0m--------------------------------------------------

--------------------------------------------------[92m[OKAY][0m--------------------------------------------------
op name
 
op name................op name--------------------------------------------------   ................installed
................   op name..installedinstalled   .. ................compatible ..
compatible  
--------------------------------------------------installedcompatible
-------------------------------------------------- 

..-------------------------------------------------- 
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
compatible
cpu_adam--------------------------------------------------cpu_adam 
async_io ............... [93m[NO][0m ....... [93m[NO][0m
 .............................. cpu_adam [92m[YES][0m [92m[YES][0m .....................  ...... [92m[OKAY][0m cpu_adam[92m[YES][0m
[92m[OKAY][0m  
.....................  [92m[OKAY][0m[92m[YES][0m
 ...... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
fused_adam .............fused_adam  [93m[NO][0m.............  .......[93m[NO][0mfused_adam  .......[92m[OKAY][0m  
.............fused_adam[92m[OKAY][0m 
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
 fused_lamb[93m[NO][0mfused_lamb.............   .......................... .......  [93m[NO][0m[93m[NO][0m[93m[NO][0m    [92m[OKAY][0m.....................  
[92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m
fused_lamb
 ............. fused_lamb[93m[NO][0m  ....................  [93m[NO][0m[92m[OKAY][0m sparse_attn
------------------------------------------------------------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
....... sparse_attn ............  ............[92m[OKAY][0m[93m[NO][0m 
 [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0msparse_attn
DeepSpeed C++/CUDA extension op report
----------------------------------------------------------------------------------------------------
DeepSpeed C++/CUDA extension op report


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

----------------------------------------------------------------------------------------------------


--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
--------------------------------------------------
 ............transformer transformer  ............[93m[NO][0m............ sparse_attn  [93m[NO][0m .......[93m[NO][0m ............ .......   .......[92m[OKAY][0m[92m[OKAY][0m[93m[NO][0m 

[92m[OKAY][0m 
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


JIT compiled ops requires ninja

----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninjaJIT compiled ops requires ninja

.......transformer  [92m[OKAY][0mstochastic_transformer............stochastic_transformer 
  [93m[NO][0m..  [93m[NO][0mtransformer  .......[93m[NO][0m  ................... .......   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[93m[NO][0m

 
....... stochastic_transformer[92m[OKAY][0m 
. [93m[NO][0mstochastic_transformer  ....... .[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
----------------------------------------------------------------------------------------------------

--------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report--------------------------------------------------


----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report

DeepSpeed C++/CUDA extension op report
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------


JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

--------------------------------------------------JIT compiled ops requires ninja

JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------
DeepSpeed C++/CUDA extension op report--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report----------------------------------------------------------------------------------------------------


----------------------------------------------------------------------------------------------------JIT compiled ops requires ninja
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------

----------------------------------------------------------------------------------------------------JIT compiled ops requires ninja


JIT compiled ops requires ninjaJIT compiled ops requires ninja

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
async_io transformer_inference...............  ..[93m[NO][0m  [93m[NO][0m.......  .......[93m[NO][0m 
JIT compiled ops requires ninja--------------------------------------------------

--------------------------------------------------DeepSpeed C++/CUDA extension op report

--------------------------------------------------DeepSpeed C++/CUDA extension op report

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------


--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
DeepSpeed C++/CUDA extension op report
JIT compiled ops requires ninja
--------------------------------------------------
--------------------------------------------------

JIT compiled ops requires ninja
[92m[OKAY][0m
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
transformer_inference quantizer..  ..............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
utils-------------------------------------------------- ..................
 [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
----------------------------------------------------------------------------------------------------
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

DeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report
--------------------------------------------------DeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------
JIT compiled ops requires ninja


--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.----------------------------------------------------------------------------------------------------


JIT compiled ops requires ninja--------------------------------------------------JIT compiled ops requires ninja


JIT compiled ops requires ninja
ninjaninjaninjaninja    ........................................................................   [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m


------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------
op name
op nameop name   ................................op name................    installed................installedinstalled    .. ..installedcompatible..   
compatible..compatible
--------------------------------------------------
 --------------------------------------------------
--------------------------------------------------
compatible

--------------------------------------------------
cpu_adam cpu_adam...............cpu_adam  ............... ............... cpu_adam[92m[YES][0m[92m[YES][0m   ...........................   [92m[YES][0m[92m[OKAY][0m[92m[OKAY][0m  

......[92m[YES][0m  [92m[OKAY][0m
...... [92m[OKAY][0m
fused_adam ............. fused_adam[93m[NO][0m fused_adam ............. ....... ............. [93m[NO][0m [92m[OKAY][0m [93m[NO][0m....... 
 .......[92m[OKAY][0m fused_lamb
[92m[OKAY][0m 
.............fused_adamfused_lamb fused_lamb[93m[NO][0m   ..........................  ....... ............. [93m[NO][0m[93m[NO][0m [92m[OKAY][0m 
 .......[93m[NO][0m.......   [92m[OKAY][0m[92m[OKAY][0m

....... [92m[OKAY][0m
fused_lambsparse_attn  .........................  [93m[NO][0m[93m[NO][0m  .......sparse_attnsparse_attn.......    ........................  [93m[NO][0m[93m[NO][0m  [92m[OKAY][0m.......[92m[OKAY][0m.......
 
 [92m[OKAY][0m[92m[OKAY][0m
transformer
 ............transformer  [93m[NO][0mtransformer............  .......  ............[92m[OKAY][0m[93m[NO][0msparse_attn  
 [93m[NO][0m...................  .......[92m[OKAY][0mstochastic_transformer 
 [92m[OKAY][0m .[93m[NO][0m
stochastic_transformer  [93m[NO][0m.......  stochastic_transformer  [92m[OKAY][0m......... 
 [92m[OKAY][0m [93m[NO][0mtransformer
  [93m[NO][0m...................   .......[92m[OKAY][0m
 [92m[OKAY][0m
[93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report--------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------


DeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------
--------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninjaJIT compiled ops requires ninja
--------------------------------------------------


JIT compiled ops requires ninjaJIT compiled ops requires ninja

ninjaninjaninjaninja    .................................... .................. ..................[92m[OKAY][0m[92m[OKAY][0m  

[92m[OKAY][0m[92m[OKAY][0m----------------------------------------------------------------------------------------------------


----------------------------------------------------------------------------------------------------op name
op name 
op name ................ ................ op name................ installed installed ................ installed ..   ..compatibleinstalled.. 
 -------------------------------------------------- 
compatible..
compatible 
compatible--------------------------------------------------
--------------------------------------------------
--------------------------------------------------

cpu_adam ............... [92m[YES][0m ...... cpu_adam[92m[OKAY][0m cpu_adam
cpu_adam ...............  ...............[92m[YES][0m...............   [92m[YES][0m......[92m[YES][0mfused_adam   [92m[OKAY][0m ............
.............   [92m[OKAY][0m[92m[OKAY][0m[93m[NO][0m
 
....... [92m[OKAY][0m
fused_adam ............. fused_lamb[93m[NO][0m  .............fused_adam.......   [93m[NO][0mfused_adam.............[92m[OKAY][0m  ....... 
 [93m[NO][0m[92m[OKAY][0m 
.............fused_lamb.......   [93m[NO][0m.............[92m[OKAY][0m  
.......[93m[NO][0m  sparse_attn[92m[OKAY][0m fused_lamb.......
............   fused_lamb[93m[NO][0m.............[92m[OKAY][0m  
 ....................[93m[NO][0m  [92m[OKAY][0m [93m[NO][0m
.......  transformer....... [92m[OKAY][0m ............
[92m[OKAY][0m sparse_attn[93m[NO][0m  
...................  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
stochastic_transformer transformersparse_attn.  ............ [93m[NO][0msparse_attn  ............ .......[93m[NO][0m............    ....... [93m[NO][0m[92m[OKAY][0m[92m[OKAY][0m[93m[NO][0m 

 ..............  [92m[OKAY][0m[92m[OKAY][0m

stochastic_transformer transformertransformer.   ........................[93m[NO][0m   [93m[NO][0m.......[93m[NO][0m   ..............[92m[OKAY][0m  
[92m[OKAY][0m[92m[OKAY][0m

stochastic_transformerstochastic_transformer  ..  [93m[NO][0m [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
ninjaninjaninjaninja    ........................................................................   [92m[OKAY][0m[92m[OKAY][0m
 [92m[OKAY][0m
[92m[OKAY][0m--------------------------------------------------
--------------------------------------------------

--------------------------------------------------
--------------------------------------------------op name
op name 
op name ................................op name    ................installed................installed    installed..installed..    compatible....
compatible  
--------------------------------------------------compatiblecompatible


------------------------------------------------------------------------------------------------------------------------------------------------------


cpu_adam ............... [92m[YES][0m cpu_adam......cpu_adam   cpu_adam[92m[OKAY][0m............... ...............
  ...............[92m[YES][0m[92m[YES][0m  [92m[YES][0m ...... ...... ...... [92m[OKAY][0m 
[92m[OKAY][0mfused_adam[92m[OKAY][0m
 
............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lambfused_adamfused_adam   .............fused_adam.............  ............. [93m[NO][0m .............[93m[NO][0m [93m[NO][0m  ..............   .......[93m[NO][0m[92m[OKAY][0m[92m[OKAY][0m  
.......
[92m[OKAY][0m 
[92m[OKAY][0mfused_lamb
 fused_lamb.............  fused_lamb............. [93m[NO][0m  .............sparse_attn [93m[NO][0m.......   .......[93m[NO][0m[92m[OKAY][0m............  
 [92m[OKAY][0m.......[93m[NO][0m
  [92m[OKAY][0m.......
 [92m[OKAY][0m
transformersparse_attn  ........................  [93m[NO][0m[93m[NO][0msparse_attn   .......sparse_attn............ .......   [92m[OKAY][0m[93m[NO][0m............[92m[OKAY][0m 

 .......[93m[NO][0m  [92m[OKAY][0mstochastic_transformer.......transformer
   [92m[OKAY][0m............
. transformer [93m[NO][0mtransformer  [93m[NO][0m ................... ............  .......[93m[NO][0m  [92m[OKAY][0m 
[93m[NO][0m[92m[OKAY][0m.......
  stochastic_transformer.......[92m[OKAY][0m  
[92m[OKAY][0m.
 [93m[NO][0mstochastic_transformer  .......stochastic_transformer  .[92m[OKAY][0m.
  [93m[NO][0m[93m[NO][0m  .............. [92m[OKAY][0m 
[92m[OKAY][0m
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------

op nameop nameop name op name   ................................................................   installedinstalled  installed ..installed .. ..   compatible..compatiblecompatible


 ----------------------------------------------------------------------------------------------------compatible--------------------------------------------------


--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
cpu_adam cpu_adamcpu_adam ..............................  cpu_adam ...............[92m[YES][0m [92m[YES][0m ...............  [92m[YES][0m ............  [92m[YES][0m[92m[OKAY][0m ......
 [92m[OKAY][0m ......
[92m[OKAY][0m 
[92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

fused_adam fused_adam............. fused_adam  .............[93m[NO][0mfused_adam.............   [93m[NO][0m ....... [93m[NO][0m............. .......   .......[92m[OKAY][0m[93m[NO][0m[92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
  
[92m[OKAY][0m.......fused_lamb
  [92m[OKAY][0m.............
async_io async_io...............utils   .................................[93m[NO][0m   [93m[NO][0m[92m[YES][0m.......   .............[93m[NO][0m  
fused_lamb  [93m[NO][0mfused_lamb ............. .......fused_lamb.............   [93m[NO][0m[92m[OKAY][0m ............. 
[92m[OKAY][0m[93m[NO][0m

[93m[NO][0m.......   [93m[NO][0m.......[92m[OKAY][0m  
.......[92m[OKAY][0m 
[92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference-------------------------------------------------- 
transformer_inference..  ..[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
sparse_attn sparse_attntransformer............  sparse_attn ........................  [93m[NO][0m [93m[NO][0m............ [93m[NO][0m  .............. [93m[NO][0m .......  [92m[OKAY][0m[92m[OKAY][0m....... 

[92m[OKAY][0m 
[92m[OKAY][0m
[92m[OKAY][0mtransformerstochastic_transformer
utils .................. utils[92m[YES][0m  ........................  [92m[YES][0m[92m[OKAY][0m 
...... [92m[OKAY][0m
 transformer ............ transformer ............[93m[NO][0m.    ............[93m[NO][0m.......[93m[NO][0m   [93m[NO][0m [92m[OKAY][0m....... .............. 
  [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


quantizer ..............quantizer  [93m[NO][0m..............  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
stochastic_transformer stochastic_transformer.stochastic_transformer   [93m[NO][0m.  ........[93m[NO][0m   [92m[OKAY][0m[93m[NO][0m 
..............  [92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

DeepSpeed C++/CUDA extension op report
--------------------------------------------------
DeepSpeed C++/CUDA extension op report

--------------------------------------------------JIT compiled ops requires ninja
--------------------------------------------------

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report


JIT compiled ops requires ninjaJIT compiled ops requires ninja
--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m


------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------


op nameop name  op nameop name................ ................   installed................installed................    installed.... installed ..  compatible compatible..

compatible --------------------------------------------------

--------------------------------------------------compatible--------------------------------------------------


--------------------------------------------------
cpu_adam ............... [92m[YES][0mcpu_adam  ......cpu_adam...............cpu_adam    ...............[92m[OKAY][0m[92m[YES][0m...............  
 [92m[YES][0m......[92m[YES][0m   ......[92m[OKAY][0m...... 
 [92m[OKAY][0m[92m[OKAY][0m
fused_adam
 ............. [93m[NO][0m ....... [92m[OKAY][0mfused_adam
 ............. [93m[NO][0mfused_adamfused_lamb fused_adam  .................... .............  [92m[OKAY][0m .............[93m[NO][0m
[93m[NO][0m   [93m[NO][0m.............. fused_lamb  ....... [92m[OKAY][0m[92m[OKAY][0m
 .............
[92m[OKAY][0m 
[93m[NO][0m fused_lamb....... fused_lamb ............. [92m[OKAY][0m .............[93m[NO][0m
  sparse_attn.......[93m[NO][0m   [92m[OKAY][0m...................
  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
sparse_attn ............transformer  [93m[NO][0m............  .......[93m[NO][0m  sparse_attn[92m[OKAY][0m.......  
[92m[OKAY][0m............sparse_attn
  transformer[93m[NO][0m............   .......stochastic_transformer[93m[NO][0m ............ [92m[OKAY][0m  
........[93m[NO][0m   [92m[OKAY][0m[93m[NO][0m.......
  transformer.......[92m[OKAY][0m  
............transformer[92m[OKAY][0m  
[93m[NO][0mstochastic_transformer............   .......[93m[NO][0m .[92m[OKAY][0m  
[93m[NO][0m.......  .......[92m[OKAY][0m 
stochastic_transformer[92m[OKAY][0m 
.stochastic_transformer  [93m[NO][0m ........  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
ninjaninjaninjaninja    ........................................................................   [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m

[92m[OKAY][0m
--------------------------------------------------

------------------------------------------------------------------------------------------------------------------------------------------------------op name


 ................op nameop nameop name    installed................................................    ..installedinstalledinstalled    .... compatible..compatible 
 
compatiblecompatible----------------------------------------------------------------------------------------------------


----------------------------------------------------------------------------------------------------

cpu_adam cpu_adamcpu_adam...............cpu_adam    ..............................[92m[YES][0m...............    [92m[YES][0m[92m[YES][0m[92m[YES][0m  ...... ..................   [92m[OKAY][0m 
[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


fused_adam ............. fused_adamfused_adamfused_adam[93m[NO][0m    ................................. .............  [92m[OKAY][0m [93m[NO][0m
[93m[NO][0m  [93m[NO][0mfused_lamb..............   .................... [92m[OKAY][0m  [92m[OKAY][0m
[92m[OKAY][0m[93m[NO][0m

 .......fused_lamb [92m[OKAY][0mfused_lambfused_lamb   
.......................................  [93m[NO][0m [93m[NO][0m [93m[NO][0m ....... .......  .......[92m[OKAY][0m[92m[OKAY][0m 

[92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformersparse_attn sparse_attn............ sparse_attn  ............ ............[93m[NO][0m   ...................[93m[NO][0m[93m[NO][0m    [93m[NO][0m[92m[OKAY][0m..............  
[92m[OKAY][0m .......
[92m[OKAY][0m stochastic_transformer
[92m[OKAY][0m 
transformer.transformer  [93m[NO][0m transformer........................    .......[93m[NO][0m............  [93m[NO][0m [92m[OKAY][0m....... 
 [93m[NO][0m.......[92m[OKAY][0m  
.......[92m[OKAY][0m 
[92m[OKAY][0m
stochastic_transformerstochastic_transformer  stochastic_transformer ..  .[93m[NO][0m[93m[NO][0m   .......[93m[NO][0m.......   [92m[OKAY][0m.......[92m[OKAY][0m
 
[92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utils utils..................  ..................[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
quantizer quantizer..............  ..............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
----------------------------------------------------------------------------------------------------

----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report


DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report

--------------------------------------------------
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------

--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
JIT compiled ops requires ninja

JIT compiled ops requires ninjaJIT compiled ops requires ninja

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ...............async_io  [93m[NO][0m...............  .......[93m[NO][0m  ....... [93m[NO][0m[93m[NO][0m

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  .............. [92m[OKAY][0m
 [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
utils ..................utils  [92m[YES][0m..................  ......[92m[YES][0m  [92m[OKAY][0m......
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
 [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. quantizer[93m[NO][0m ....... [92m[OKAY][0m 
.............. [93m[NO][0m .......-------------------------------------------------- 
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
[92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
ninjaninjaninjaninja   .................. .................. .................................... [92m[OKAY][0m 
 [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m--------------------------------------------------


------------------------------------------------------------------------------------------------------------------------------------------------------
op name

 op name................ op nameop name................    installed................................ installed ..   installed..installed compatible  compatible
..
..--------------------------------------------------  --------------------------------------------------
compatible
compatible

----------------------------------------------------------------------------------------------------

cpu_adamcpu_adam  ..............................cpu_adamcpu_adam    [92m[YES][0m...............[92m[YES][0m...............  ......  [92m[YES][0m......  [92m[YES][0m[92m[OKAY][0m [92m[OKAY][0m 
......
......  [92m[OKAY][0m[92m[OKAY][0m

fused_adam ............. fused_adam[93m[NO][0m  ....................fused_adam fused_adam  [92m[OKAY][0m [93m[NO][0m
..........................   [93m[NO][0m.......fused_lamb [93m[NO][0m .......   ....................[92m[OKAY][0m[92m[OKAY][0m 
 
[93m[NO][0m[92m[OKAY][0m fused_lamb.......
fused_lamb   .............[92m[OKAY][0m.............fused_lamb 
  [93m[NO][0m.............[93m[NO][0m   .......[93m[NO][0m.......   [92m[OKAY][0m[92m[OKAY][0m.......

sparse_attn  [92m[OKAY][0m............
 [93m[NO][0m ....... [92m[OKAY][0m
transformer sparse_attnsparse_attn............  ............ [93m[NO][0m sparse_attn [93m[NO][0m............ .......  ............  .......[93m[NO][0m[92m[OKAY][0m 
[93m[NO][0m .......  [92m[OKAY][0m.......stochastic_transformer[92m[OKAY][0m
  
[92m[OKAY][0m.transformer
transformer  [93m[NO][0m  ............transformer...................  [93m[NO][0m  ............[93m[NO][0m [92m[OKAY][0m  ..............[93m[NO][0m 
 ....... [92m[OKAY][0m 
[92m[OKAY][0m[92m[OKAY][0m

stochastic_transformer stochastic_transformerstochastic_transformer.   [93m[NO][0m.  ........[93m[NO][0m   [93m[NO][0m.......[92m[OKAY][0m  
.......[92m[OKAY][0m
 [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0masync_io
 ............... [93m[NO][0m ....... [93m[NO][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference ..utils  [93m[NO][0m..................  .......[92m[YES][0m  [92m[OKAY][0m......
--------------------------------------------------
 [92m[OKAY][0m
utilsquantizer  ................................  [92m[YES][0m[93m[NO][0m  .............  [92m[OKAY][0m[92m[OKAY][0m

quantizer --------------------------------------------------..............
 [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m

utils .................. [92m[YES][0m ...... [92m[OKAY][0m
async_io async_io...............  ...............[93m[NO][0m  [93m[NO][0m.......  .......[93m[NO][0m 
[93m[NO][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference transformer_inference..  ..[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------
utils utils..................  ..................[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

ninjaninjaninjaninja   .................. .................................... ..................  [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m

[92m[OKAY][0m
----------------------------------------------------------------------------------------------------
--------------------------------------------------


op name--------------------------------------------------op name op name
  ................................ op name installed................ installed ................ ..  installed installed..compatible  
 compatible....--------------------------------------------------
 
 --------------------------------------------------compatiblecompatible


----------------------------------------------------------------------------------------------------

cpu_adam ............... [92m[YES][0mcpu_adam  ..................... cpu_adam cpu_adam[92m[OKAY][0m [92m[YES][0m
  ....................................  [92m[YES][0m [92m[YES][0m[92m[OKAY][0m  
............fused_adam   [92m[OKAY][0m.............[92m[OKAY][0m
 
[93m[NO][0m ....... [92m[OKAY][0mfused_adam
 ............. [93m[NO][0mfused_adam fused_lamb ....... ............. .............fused_adam [92m[OKAY][0m  
[93m[NO][0m[93m[NO][0m.............  fused_lamb ....... [93m[NO][0m....... ............. [92m[OKAY][0m  .......
[92m[OKAY][0m[93m[NO][0m 
 [92m[OKAY][0m....... 
[92m[OKAY][0mfused_lamb
fused_lamb  .......................... [93m[NO][0m  [93m[NO][0m....... sparse_attn.......   ............[92m[OKAY][0m[92m[OKAY][0m
 
sparse_attn[93m[NO][0m  ...................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
transformer ............transformersparse_attn  sparse_attn ........................[93m[NO][0m    [93m[NO][0m...................[93m[NO][0m    [92m[OKAY][0m[93m[NO][0m..............
   .......[92m[OKAY][0m[92m[OKAY][0m stochastic_transformer

[92m[OKAY][0m 
.transformerstochastic_transformer  [93m[NO][0m transformer .............  ...................  [93m[NO][0m [92m[OKAY][0m[93m[NO][0m [93m[NO][0m
 .......  ..............[92m[OKAY][0m 
 [92m[OKAY][0m[92m[OKAY][0m

stochastic_transformerstochastic_transformer  ..  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report

--------------------------------------------------
DeepSpeed C++/CUDA extension op report--------------------------------------------------
--------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------


------------------------------------------------------------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


JIT compiled ops requires ninja
JIT compiled ops requires ninja
JIT compiled ops requires ninja
--------------------------------------------------

JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
------------------------------------------------------------------------------------------------------------------------------------------------------


DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report--------------------------------------------------


----------------------------------------------------------------------------------------------------

--------------------------------------------------
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja


------------------------------------------------------------------------------------------------------------------------------------------------------

JIT compiled ops requires ninja
JIT compiled ops requires ninja

NOTE: Ops not installed will be just-in-time (JIT) compiled at
--------------------------------------------------
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0mutils  .........................  [92m[OKAY][0m[92m[YES][0m
 ...... [92m[OKAY][0m
async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
quantizer ..............utils  [93m[NO][0m..................  .......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
transformer_inference transformer_inference..  ..[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
--------------------------------------------------
utils ..................utils  [92m[YES][0m..................  ......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. quantizer[93m[NO][0m  .....................  [93m[NO][0m[92m[OKAY][0m 
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
....... [92m[OKAY][0m--------------------------------------------------

--------------------------------------------------
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninjaninjaninjaninja   .................. .................................... ..................[92m[OKAY][0m   
[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m
--------------------------------------------------


--------------------------------------------------
----------------------------------------------------------------------------------------------------op nameop name

  ................op name................ op name installed installed  ................ .................. ..   installedinstalledcompatiblecompatible 
 
..--------------------------------------------------..--------------------------------------------------
  
compatiblecompatible

----------------------------------------------------------------------------------------------------

cpu_adam ............... cpu_adam[92m[YES][0m  .....................  [92m[YES][0mcpu_adam[92m[OKAY][0mcpu_adam  
...... ............... ............... [92m[OKAY][0m [92m[YES][0m[92m[YES][0m
  ............  [92m[OKAY][0mfused_adam
[92m[OKAY][0m 
............. [93m[NO][0m .......fused_adam  [92m[OKAY][0m.............
 [93m[NO][0m fused_adam.......fused_adamfused_lamb    .............[92m[OKAY][0m..........................  
 [93m[NO][0m[93m[NO][0m[93m[NO][0m   .......fused_lamb..............   [92m[OKAY][0m .............
[92m[OKAY][0m [92m[OKAY][0m
[93m[NO][0m
 .......fused_lamb fused_lamb [92m[OKAY][0m .............
............. [93m[NO][0m sparse_attn [93m[NO][0m ....... ............ ....... [92m[OKAY][0m [93m[NO][0m
[92m[OKAY][0m 
....... sparse_attn[92m[OKAY][0m 
............ [93m[NO][0m transformer.......  ............[92m[OKAY][0m 
[93m[NO][0m .......sparse_attn transformersparse_attn  [92m[OKAY][0m ........................
 ............ [93m[NO][0m[93m[NO][0m   stochastic_transformer[93m[NO][0m ..............   ........[92m[OKAY][0m[92m[OKAY][0m 

 [93m[NO][0m[92m[OKAY][0m transformer
stochastic_transformer.......   transformer............[92m[OKAY][0m. 
  ............[93m[NO][0m[93m[NO][0m   [93m[NO][0m..............   .......[92m[OKAY][0m[92m[OKAY][0m 

[92m[OKAY][0m
stochastic_transformer stochastic_transformer . .[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path ...............  ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ...............async_io  [93m[NO][0m ......................  [93m[NO][0m[93m[NO][0m 
....... [93m[NO][0m
transformer_inference ..transformer_inference  [93m[NO][0m..  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
utils utils..................  ..................[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

ninjaninjaninjaninja   .................. .................. ....................................[92m[OKAY][0m  [92m[OKAY][0m[92m[OKAY][0m 


[92m[OKAY][0m--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
op name
op name op name  op name................................................    installedinstalled installed..................    ....compatibleinstalled 
  compatible..--------------------------------------------------compatible 


compatible----------------------------------------------------------------------------------------------------


--------------------------------------------------
cpu_adam ............... cpu_adam[92m[YES][0mcpu_adamcpu_adam    ...................................................    [92m[YES][0m[92m[YES][0m[92m[OKAY][0m[92m[YES][0m
   ............ ...... [92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m

fused_adam ............. [93m[NO][0m .......fused_adam fused_adam fused_adam[92m[OKAY][0m............. 
  ..........................fused_lamb[93m[NO][0m    [93m[NO][0m....................[93m[NO][0m   [93m[NO][0m [92m[OKAY][0m..............  
....... [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m
fused_lamb

DeepSpeed general environment info:
 .............fused_lamb fused_lamb [93m[NO][0m.............  ....................   [93m[NO][0m[93m[NO][0m[92m[OKAY][0m  
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
sparse_attn ..........................   [92m[OKAY][0m[92m[OKAY][0m[93m[NO][0m

torch version .................... 1.8.1
 ....... [92m[OKAY][0m
sparse_attn ............transformer  [93m[NO][0m............  .......[93m[NO][0m  sparse_attnsparse_attn[92m[OKAY][0m 
torch cuda version ............... 11.1
.......  ............[92m[OKAY][0mtransformer............
nvcc version ..................... 11.2
   ............[93m[NO][0m[93m[NO][0m  stochastic_transformer ....... [93m[NO][0m.......  ........[92m[OKAY][0m   
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
[92m[OKAY][0m[92m[OKAY][0m[93m[NO][0m
transformer 
 ................... transformerstochastic_transformer[92m[OKAY][0m  
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
 [93m[NO][0m.  ...................[93m[NO][0m   [93m[NO][0m[92m[OKAY][0m....... 
 .......[92m[OKAY][0m 
[92m[OKAY][0mstochastic_transformer
 . stochastic_transformer[93m[NO][0m  ........  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:DeepSpeed general environment info:

torch install path ...............torch install path  ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']1.8.1

torch cuda version torch version...............  11.1....................
 nvcc version1.8.1 
..................... 11.2torch cuda version
 deepspeed install path............... ...........  11.1['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

DeepSpeed general environment info:deepspeed infonvcc version  ...................
 .....................0.4.2+bc17042, bc17042, big-science 
11.2deepspeed wheel compiled w.
 torch install path...... deepspeed install path ...............torch 1.8, cuda 11.1  
........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

deepspeed info torch version................... ....................  0.4.2+bc17042, bc17042, big-science1.8.1

deepspeed wheel compiled w.torch cuda version  .....................  11.1torch 1.8, cuda 11.1

nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
DeepSpeed general environment info:
DeepSpeed general environment info:
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
async_io ............... [93m[NO][0masync_io  ......................  [93m[NO][0m[93m[NO][0m 
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
....... [93m[NO][0m
torch version .................... 1.8.1
transformer_inference .. transformer_inference [93m[NO][0m..  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
torch cuda version ............... 11.1
torch version .................... 1.8.1
utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

torch cuda version ............... 11.1
quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

async_io ............... [93m[NO][0m ....... [93m[NO][0m
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
nvcc version ..................... 11.2
----------------------------------------------------------------------------------------------------

transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:DeepSpeed general environment info:

DeepSpeed general environment info:
--------------------------------------------------
torch install path torch install path...............  ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch version
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
 .................... torch version1.8.1 
.................... 1.8.1torch cuda version
 ............... torch cuda version11.1 
...............nvcc version  11.1.....................
 nvcc version11.2 
..................... deepspeed install path11.2 
torch version .................... 1.8.1
........... deepspeed install path ...........['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
torch cuda version ............... 11.1
...................deepspeed info  0.4.2+bc17042, bc17042, big-science...................
 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w.
 deepspeed wheel compiled w.......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m
utils ..................
 [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. utils[93m[NO][0m  .........................  [92m[OKAY][0m
[92m[YES][0m-------------------------------------------------- 
...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0mutils  ......................... [92m[OKAY][0m 
[92m[YES][0m ...... [92m[OKAY][0m
utils ..................quantizer [92m[YES][0m  ....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
quantizer --------------------------------------------------..............
 [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path ...............  ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report--------------------------------------------------
--------------------------------------------------

DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
----------------------------------------------------------------------------------------------------
--------------------------------------------------


--------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja
----------------------------------------------------------------------------------------------------


--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


JIT compiled ops requires ninja----------------------------------------------------------------------------------------------------


JIT compiled ops requires ninjaJIT compiled ops requires ninja

----------------------------------------------------------------------------------------------------

--------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report


--------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op report--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------
--------------------------------------------------


--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja

JIT compiled ops requires ninja

--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
async_io async_io...............  [93m[NO][0m...............  .......[93m[NO][0m  [93m[NO][0m.......
 [93m[NO][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference ..transformer_inference  [93m[NO][0m..  .......[93m[NO][0m  [92m[OKAY][0m.......
--------------------------------------------------
 [92m[OKAY][0m
utils ..................utils  [92m[YES][0m..................  ......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
quantizer ..............quantizer  [93m[NO][0m..............  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninjaninjaninjaninja   .................................... ..................  .................. [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m

--------------------------------------------------
--------------------------------------------------
----------------------------------------------------------------------------------------------------


op nameop name op nameop name  ................  ................................installed................   installed ..installed installed ..   compatible..compatible..
  
--------------------------------------------------compatiblecompatible
--------------------------------------------------


----------------------------------------------------------------------------------------------------

cpu_adamcpu_adam cpu_adam............... cpu_adam...............    [92m[YES][0m...............[92m[YES][0m...............   ......[92m[YES][0m ...... ...... [92m[YES][0m  [92m[OKAY][0m[92m[OKAY][0m 
[92m[OKAY][0m
......
 [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
fused_adamfused_adam  .......................... fused_adam[93m[NO][0m   fused_adam[93m[NO][0m  ........................................    [92m[OKAY][0m[93m[NO][0m[93m[NO][0m[92m[OKAY][0m 
 
async_io ............... [93m[NO][0m ....... [93m[NO][0m
..............fused_lamb   [92m[OKAY][0mfused_lamb[92m[OKAY][0m.............
  
.............[93m[NO][0m  fused_lamb[93m[NO][0mfused_lamb.......    [92m[OKAY][0m................................. 
  [93m[NO][0m[93m[NO][0m[92m[OKAY][0m  
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
..............  [92m[OKAY][0m[92m[OKAY][0m

sparse_attn ............ [93m[NO][0m .......sparse_attn  [92m[OKAY][0m............
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
 [93m[NO][0msparse_attn transformersparse_attn .......  ............ ........................[92m[OKAY][0m  
[93m[NO][0m [93m[NO][0m transformer.......[93m[NO][0m    ..........................[92m[OKAY][0m   
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
[92m[OKAY][0m[92m[OKAY][0m[93m[NO][0m
transformer 
--------------------------------------------------
 ................... transformerstochastic_transformer[92m[OKAY][0m
   [93m[NO][0m............  ........stochastic_transformer  [93m[NO][0m[93m[NO][0m.    [92m[OKAY][0m.......[93m[NO][0m.......
   [92m[OKAY][0m.......
[92m[OKAY][0mstochastic_transformer 
 [92m[OKAY][0m
. stochastic_transformer[93m[NO][0m  ....... .[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... [93m[NO][0m async_io.......  [93m[NO][0m...............
 [93m[NO][0m ....... [93m[NO][0m
transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ...............async_io [93m[NO][0m  ......................  [93m[NO][0m .......[93m[NO][0m 
[93m[NO][0m
transformer_inference transformer_inference..  ..[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m
[92m[OKAY][0m
quantizer quantizer..............  ..............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io [93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. ..............................  [93m[NO][0m
[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inference transformer_inference..  ..[93m[NO][0m  [93m[NO][0m....... async_io .......[92m[OKAY][0m  
...............[92m[OKAY][0m 
[93m[NO][0m ....... [93m[NO][0m
utils utils..................  ..................[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
transformer_inferencequantizerquantizer   ..............................   [93m[NO][0m[93m[NO][0m[93m[NO][0m   .....................   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


----------------------------------------------------------------------------------------------------

utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------

op name
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
op name op nameop name  ................................  ................installed ................  installed installed.. installed  .. compatible..
  ..--------------------------------------------------compatiblecompatible 


compatible--------------------------------------------------

utils .................. [92m[YES][0m ...... [92m[OKAY][0m
----------------------------------------------------------------------------------------------------

cpu_adam ............... [92m[YES][0m cpu_adam......  ...............[92m[OKAY][0m 
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam[92m[YES][0mcpu_adam   ....................................   [92m[OKAY][0m[92m[YES][0m[92m[YES][0m
--------------------------------------------------
  fused_adam............   .............[92m[OKAY][0m[92m[OKAY][0m 
[93m[NO][0m
fused_adam  ....................  [92m[OKAY][0m[93m[NO][0m
 ....... fused_lamb[92m[OKAY][0m 
fused_adamfused_adam.............  fused_lamb .......................... [93m[NO][0m   .............[93m[NO][0m....... [93m[NO][0m  [93m[NO][0m.......[92m[OKAY][0m   
.............. [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m


fused_lambfused_lamb  ..........................  [93m[NO][0m[93m[NO][0m sparse_attn.......   sparse_attn...................[92m[OKAY][0m 
............   [93m[NO][0m[92m[OKAY][0m[93m[NO][0m 
 ..............  [92m[OKAY][0m[92m[OKAY][0m

sparse_attntransformertransformer  ........................   ............sparse_attn[93m[NO][0m[93m[NO][0m    [93m[NO][0m................... .......[92m[OKAY][0m  
 .......[93m[NO][0m [92m[OKAY][0m [92m[OKAY][0mstochastic_transformer
.......
 transformer  [92m[OKAY][0m.............
 stochastic_transformer [93m[NO][0m[93m[NO][0m  transformer ...............   [92m[OKAY][0m [92m[OKAY][0m............
[93m[NO][0m
  [93m[NO][0m.......  .......[92m[OKAY][0mstochastic_transformer 
 [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
. [93m[NO][0mstochastic_transformer  ....... [92m[OKAY][0m.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
 [93m[NO][0m ....... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. utils[93m[NO][0m  .........................  [92m[YES][0m[92m[OKAY][0m 
...... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.utils
 quantizer..................  ..............[92m[YES][0m  [93m[NO][0m......  .......[92m[OKAY][0m 
[92m[OKAY][0m
quantizer --------------------------------------------------..............
 [93m[NO][0m ....... [92m[OKAY][0m
async_io-------------------------------------------------- 
............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io async_io...............  ...............[93m[NO][0m  [93m[NO][0m.......  .......[93m[NO][0m 
[93m[NO][0m
transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

quantizer ..............quantizer  [93m[NO][0m..............  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0mtransformer_inference
 .. [93m[NO][0m ....... [92m[OKAY][0mutils
 .................. [92m[YES][0m ...... utils[92m[OKAY][0m 
.................. [92m[YES][0m quantizer......  ..............[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0mquantizer
 .............. [93m[NO][0m --------------------------------------------------.......
 [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference [93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`... [93m[NO][0m 
....... [92m[OKAY][0m
utils .................. [92m[YES][0m ......async_io  [92m[OKAY][0m...............
 [93m[NO][0m ....... [93m[NO][0mquantizer
 .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io async_io...............  ...............[93m[NO][0m  [93m[NO][0m.......  .......[93m[NO][0m 
[93m[NO][0m
transformer_inference transformer_inference..  ..[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

[92m[OKAY][0m
async_io ............... [93m[NO][0masync_io  ....... ...............[93m[NO][0m 
[93m[NO][0m ....... [93m[NO][0m
utils utils..................  .................. [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

transformer_inference .. [93m[NO][0m transformer_inference.......  .. [92m[OKAY][0m[93m[NO][0m
quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

 ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ......utils  [92m[OKAY][0m..................
 [92m[YES][0m ...... [92m[OKAY][0mquantizer
 .............. [93m[NO][0mquantizer  .....................  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m--------------------------------------------------

--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
DeepSpeed general environment info:
async_io ............... [93m[NO][0m ....... [93m[NO][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
torch cuda version ............... 11.1
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m .......async_io  [93m[NO][0m...............
 [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m .......transformer_inference  [92m[OKAY][0m..
 [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ......utils  [92m[OKAY][0m..................
 [92m[YES][0m quantizer......  ..............[92m[OKAY][0m 
[93m[NO][0m ....... quantizer[92m[OKAY][0m 
.............. [93m[NO][0m ....... --------------------------------------------------[92m[OKAY][0m

--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference utils..  ..................[93m[NO][0m  [92m[YES][0m.......  ......[92m[OKAY][0m 
[92m[OKAY][0m
quantizerutils  ................................  [92m[YES][0m[93m[NO][0m  .............  [92m[OKAY][0m[92m[OKAY][0m

quantizer-------------------------------------------------- 
.............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
--------------------------------------------------
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... async_io[93m[NO][0m  ......................  [93m[NO][0m[93m[NO][0m 
....... [93m[NO][0m
transformer_inference ..transformer_inference  [93m[NO][0m..  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
utils utils..................  ..................[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utilstransformer_inference  .................... [92m[YES][0m  ......[93m[NO][0m  [92m[OKAY][0m
....... quantizer[92m[OKAY][0m 
.............. [93m[NO][0m ....... [92m[OKAY][0m
utils-------------------------------------------------- 
.................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
DeepSpeed general environment info:
torch version .................... 1.8.1
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch cuda version ............... 11.1
torch version .................... 1.8.1
nvcc version ..................... 11.2
torch cuda version ............... 11.1
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
nvcc version ..................... 11.2
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ...............async_io [93m[NO][0m  ......................  [93m[NO][0m[93m[NO][0m 
....... [93m[NO][0m
transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utils .................. [92m[YES][0m utils......  ..................[92m[OKAY][0m 
[92m[YES][0mquantizer  ....................  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m --------------------------------------------------
....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils ..................async_io  [92m[YES][0m...............  ......[93m[NO][0m  [92m[OKAY][0m.......
 [93m[NO][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference-------------------------------------------------- 
.. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0mtransformer_inference
 .. [93m[NO][0m .......utils  [92m[OKAY][0m..................
 [92m[YES][0m ...... [92m[OKAY][0m
utils ..................quantizer  [92m[YES][0m..............  ......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
quantizer ..............-------------------------------------------------- 
[93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
DeepSpeed general environment info:nvcc version .....................DeepSpeed general environment info: 
11.2

deepspeed install path ...........torch install path  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']torch install path...............
  deepspeed info...............  ................... 0.4.2+bc17042, bc17042, big-science
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']deepspeed wheel compiled w. 
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']...... 
torch versiontorch 1.8, cuda 11.1 
torch version....................  ....................1.8.1 
1.8.1
torch cuda version torch cuda version...............  ...............11.1 
11.1
DeepSpeed general environment info:DeepSpeed general environment info:

nvcc version nvcc version.....................  ..................... 11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  DeepSpeed general environment info:11.211.2

deepspeed install path
deepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']torch install path['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
 
deepspeed info...............deepspeed info   ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ......
......  torch 1.8, cuda 11.1torch 1.8, cuda 11.1torch version

 .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... [93m[NO][0masync_io .......  ...............[93m[NO][0m 
[93m[NO][0m ....... [93m[NO][0m
transformer_inference ..transformer_inference  [93m[NO][0m..  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
utils ..................utils  [92m[YES][0m..................  ......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
quantizer .............. quantizer[93m[NO][0m  .....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path ...............  ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utils utils..................  ..................[92m[YES][0m  [92m[YES][0m ............  [92m[OKAY][0m[92m[OKAY][0m

quantizer quantizer..............  ..............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m
 [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inference ..transformer_inference  [93m[NO][0m..  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
utils ..................utils  [92m[YES][0m..................  ......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
quantizer ..............quantizer  [93m[NO][0m..............  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io async_io...............  ...............[93m[NO][0m  [93m[NO][0m.......  .......[93m[NO][0m 
[93m[NO][0m
transformer_inference transformer_inference..  ..[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
utils utils..................  ..................[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
--------------------------------------------------
DeepSpeed general environment info:
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
async_io ............... [93m[NO][0m async_io.......  [93m[NO][0m...............
torch version .................... 1.8.1
torch cuda version ............... 11.1
 [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
transformer_inference .. [93m[NO][0mutils  .........................  [92m[OKAY][0m[92m[YES][0m
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
 ...... [92m[OKAY][0m
utilsquantizer  ................................  [92m[YES][0m[93m[NO][0m  .............  [92m[OKAY][0m[92m[OKAY][0m

quantizer-------------------------------------------------- 
.............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... async_io[93m[NO][0m
 ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. transformer_inference[93m[NO][0m  .........  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
utils .................. utils[92m[YES][0m  ........................  [92m[YES][0m[92m[OKAY][0m
 ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0mquantizer  .....................  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
DeepSpeed general environment info:
async_io ............... [93m[NO][0m ....... [93m[NO][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... async_io[93m[NO][0m  ......................  [93m[NO][0m[93m[NO][0m 
....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0mtransformer_inference
 .. [93m[NO][0m ....... [92m[OKAY][0mutils
 .................. [92m[YES][0m ...... utils[92m[OKAY][0m 
.................. [92m[YES][0m quantizer......  ..............[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m-------------------------------------------------- 
....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
/bin/sh: line 0: type: git: not found
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io async_io...............  [93m[NO][0m...............  .......[93m[NO][0m  [93m[NO][0m.......
 [93m[NO][0m
transformer_inference ..transformer_inference  [93m[NO][0m..  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
utils .................. utils[92m[YES][0m  ........................  [92m[YES][0m[92m[OKAY][0m 
...... [92m[OKAY][0mquantizer
 .............. quantizer[93m[NO][0m  .....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

--------------------------------------------------
DeepSpeed C++/CUDA extension op report--------------------------------------------------

--------------------------------------------------DeepSpeed C++/CUDA extension op report--------------------------------------------------

async_io ............... async_io[93m[NO][0m  ......................  [93m[NO][0m[93m[NO][0m 
....... [93m[NO][0m

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------DeepSpeed C++/CUDA extension op report


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.----------------------------------------------------------------------------------------------------


--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja


--------------------------------------------------JIT compiled ops requires ninja--------------------------------------------------

JIT compiled ops requires ninja

DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
transformer_inference ..transformer_inference  [93m[NO][0m..  .......[93m[NO][0m  [92m[OKAY][0m.......
--------------------------------------------------
JIT compiled ops requires ninja
 [92m[OKAY][0m
utils .................. [92m[YES][0mutils  ........................  [92m[OKAY][0m[92m[YES][0m
 ...... [92m[OKAY][0m
quantizer .............. quantizer[93m[NO][0m  .....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
/bin/sh: line 0: type: git: not found
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io async_io...............  ...............[93m[NO][0m  [93m[NO][0m.......  .......[93m[NO][0m 
[93m[NO][0m
transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io transformer_inference...............  ..[93m[NO][0m  [93m[NO][0m.......  .......[93m[NO][0m 
[92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0mtransformer_inference
 .. [93m[NO][0mquantizer  .....................  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
utils ..................-------------------------------------------------- 
[92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:DeepSpeed general environment info:

torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
.................... 1.8.1torch version
 ....................torch cuda version  1.8.1...............
 11.1
torch cuda version nvcc version...............  .....................11.1 
11.2
nvcc version deepspeed install path.....................  ...........11.2 
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed install path
 ...........deepspeed info  ...................['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
0.4.2+bc17042, bc17042, big-science
deepspeed infodeepspeed wheel compiled w.  .........................  0.4.2+bc17042, bc17042, big-sciencetorch 1.8, cuda 11.1

deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
ninjaninjaninjaninja   .................. ..................  ..................[92m[OKAY][0m[92m[OKAY][0m.................. 

 [92m[OKAY][0m----------------------------------------------------------------------------------------------------[92m[OKAY][0m


--------------------------------------------------op nameop name
  --------------------------------------------------................................op name  
 installedinstalled................op name   .... ................  compatibleinstalled compatible 

installed..--------------------------------------------------  
..--------------------------------------------------compatible 

compatible
--------------------------------------------------
--------------------------------------------------
cpu_adam ...............cpu_adam  [92m[YES][0m...............  ......cpu_adam[92m[YES][0mcpu_adam    [92m[OKAY][0m...............
/bin/sh: line 0: type: git: not found
...............  [92m[YES][0m[92m[YES][0m  ..................   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m

fused_adam ............. 
[93m[NO][0m ....... [92m[OKAY][0m
fused_adamfused_adam  fused_lamb..........................   .............[93m[NO][0mfused_adam[93m[NO][0m  [93m[NO][0m .......  ....... .............[92m[OKAY][0m ....... 
[93m[NO][0m[92m[OKAY][0m  
[92m[OKAY][0m.......fused_lamb
  ............. [93m[NO][0m [92m[OKAY][0mfused_lamb.......
  .............[92m[OKAY][0mfused_lamb 
 sparse_attn[93m[NO][0m  ................................   [93m[NO][0m[92m[OKAY][0m[93m[NO][0m  
....... [92m[OKAY][0msparse_attn
 ............transformer  .......[93m[NO][0m............   .......[93m[NO][0msparse_attn[92m[OKAY][0m   ............
.......[92m[OKAY][0m  
[93m[NO][0m[92m[OKAY][0m 
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
transformer.......  ............[92m[OKAY][0m stochastic_transformer[93m[NO][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
  transformer........   [93m[NO][0m............[92m[OKAY][0msparse_attn 
  .......[93m[NO][0m............  stochastic_transformer [92m[OKAY][0m....... 
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
 [93m[NO][0m.[92m[OKAY][0m  
.......[93m[NO][0mstochastic_transformer   [92m[OKAY][0m.......
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
 .[92m[OKAY][0mtransformer 
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
[93m[NO][0m .......  ............[92m[OKAY][0m 
--------------------------------------------------
[93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... async_io[93m[NO][0m  ...................... [93m[NO][0m .......  [93m[NO][0m[93m[NO][0m

transformer_inference transformer_inference.. [93m[NO][0m .......  ..[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m
utils .................. utils[92m[YES][0m  ........................  [92m[YES][0m[92m[OKAY][0m 
...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. --------------------------------------------------[93m[NO][0m
 ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
DeepSpeed general environment info:
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
--------------------------------------------------
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m
[93m[NO][0m
transformer_inference transformer_inference..  ..[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
utils .................. [92m[YES][0mutils  ........................  [92m[OKAY][0m[92m[YES][0m
 ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0mquantizer  .....................  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inference ..transformer_inference  [93m[NO][0m..  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m .......  .......[92m[OKAY][0m 
[92m[OKAY][0m
async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m
utils utils..................  ..................[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 

----------------------------------------------------------------------------------------------------
[92m[OKAY][0m

quantizer ..............quantizer  [93m[NO][0m..............  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed general environment info:DeepSpeed general environment info:DeepSpeed general environment info:


torch install pathtorch install pathtorch install path   .............................................   ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']


torch versiontorch versiontorch version   ............................................................   1.8.11.8.11.8.1


torch cuda versiontorch cuda versiontorch cuda version   .............................................   11.111.111.1


nvcc versionnvcc versionnvcc version   ...............................................................   11.211.211.2


deepspeed install pathdeepspeed install pathdeepspeed install path   .................................   ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']


deepspeed infodeepspeed infodeepspeed info   .........................................................   0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science


deepspeed wheel compiled w.deepspeed wheel compiled w. deepspeed wheel compiled w. ...... ...... ...... torch 1.8, cuda 11.1 torch 1.8, cuda 11.1
torch 1.8, cuda 11.1

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
async_io ............... [93m[NO][0m async_io.......  ...............[93m[NO][0m 
[93m[NO][0m ....... [93m[NO][0m
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
transformer_inference ..transformer_inference  [93m[NO][0m..  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
utilsutils  ....................................  [92m[YES][0m [92m[YES][0m......  ......[92m[OKAY][0m
 [92m[OKAY][0m
quantizer .............. quantizer[93m[NO][0m  .....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
DeepSpeed general environment info:
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
torch version .................... 1.8.1
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
torch cuda version ............... 11.1
--------------------------------------------------
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
--------------------------------------------------
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inference .. transformer_inference[93m[NO][0m  .........  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
utils .................. utils[92m[YES][0m  ........................ [92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
quantizer quantizer..............  ..............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed info deepspeed info...................  ...................0.4.2+bc17042, bc17042, big-science 
0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
DeepSpeed general environment info:
torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']....................
 1.8.1
torch version ....................torch cuda version  ...............1.8.1 
11.1
nvcc versiontorch cuda version  ....................................  11.211.1

deepspeed install pathnvcc version  ................................  11.2['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed install pathdeepspeed info  ..............................  0.4.2+bc17042, bc17042, big-science['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed wheel compiled w.deepspeed info  .........................  torch 1.8, cuda 11.1
0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install path ...............torch install path  ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. utils[93m[NO][0m  .........................  [92m[YES][0m[92m[OKAY][0m 
...... [92m[OKAY][0m
utilsquantizer  ................................  [92m[YES][0m[93m[NO][0m  .............  [92m[OKAY][0m[92m[OKAY][0m

quantizer-------------------------------------------------- 
.............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ....... .......[93m[NO][0m 
[93m[NO][0m
torch version .................... 1.8.1
torch cuda version ............... 11.1
transformer_inference transformer_inference..  ..[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m [92m[OKAY][0m

nvcc version ..................... 11.2
utils ..................utils  [92m[YES][0m..................  ......[92m[YES][0m  [92m[OKAY][0m...... 
[92m[OKAY][0m
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
quantizer quantizer..............  ..............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch version .................... 1.8.1
torch version torch cuda version....................  ...............1.8.1 
11.1
torch cuda versionnvcc version  ....................................  11.111.2

nvcc versiondeepspeed install path  ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'].....................
 11.2deepspeed info
 ...................deepspeed install path  0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. .................  torch 1.8, cuda 11.1['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install path torch install path...............  ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch version torch version....................  ....................1.8.1 
1.8.1
torch cuda version torch cuda version...............  ...............11.1 
11.1nvcc version
 nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info.....................  ...................11.2 
0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
DeepSpeed general environment info:torch install path
 ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']....................
 1.8.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
torch version torch cuda version....................  ...............1.8.1 
11.1
async_io ............... [93m[NO][0m ....... [93m[NO][0m
torch cuda versionnvcc version  ....................................  11.111.2

nvcc versiondeepspeed install path  ................................  11.2
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed install path
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
 deepspeed info...........  ................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']0.4.2+bc17042, bc17042, big-science

deepspeed infodeepspeed wheel compiled w.  .........................  0.4.2+bc17042, bc17042, big-sciencetorch 1.8, cuda 11.1

utils .................. [92m[YES][0m ...... [92m[OKAY][0m
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path ...............  ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

DeepSpeed general environment info:
torch versiontorch version  ........................................  1.8.11.8.1

torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science
0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w.
 deepspeed wheel compiled w.......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
DeepSpeed general environment info:torch install path
 ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']....................
 1.8.1
torch version torch cuda version....................  ...............1.8.1 
11.1
torch cuda versionnvcc version  ....................................  11.111.2

nvcc versiondeepspeed install path  ................................  11.2
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed install path
 ...........deepspeed info  ...................['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 0.4.2+bc17042, bc17042, big-science

deepspeed infodeepspeed wheel compiled w.  .........................  0.4.2+bc17042, bc17042, big-sciencetorch 1.8, cuda 11.1

deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version ....................torch version  1.8.1....................
 1.8.1torch cuda version
 torch cuda version...............  ...............11.1 
11.1nvcc version
 nvcc version.....................  .....................11.2 
11.2deepspeed install path
 deepspeed install path...........  ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info
 deepspeed info...................  ...................0.4.2+bc17042, bc17042, big-science 
0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w.
 deepspeed wheel compiled w.......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ...............async_io [93m[NO][0m  ......................  [93m[NO][0m[93m[NO][0m 
async_io ............... [93m[NO][0m .......transformer_inference  [93m[NO][0m..
....... [93m[NO][0m
transformer_inference transformer_inference..  ..[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
 [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
[92m[OKAY][0m
utils ..................transformer_inference  [92m[YES][0m..  ......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

quantizer .............. [93m[NO][0m utils.......  ..................[92m[OKAY][0m 
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m
torch version .................... 1.8.1

[92m[YES][0m ...... [92m[OKAY][0m--------------------------------------------------

torch cuda version ............... 11.1
nvcc version ..................... 11.2
----------------------------------------------------------------------------------------------------

quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
--------------------------------------------------
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inference .. [93m[NO][0mtransformer_inference  .........  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
utils .................. [92m[YES][0mutils  ........................  [92m[OKAY][0m[92m[YES][0m
 ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m .......quantizer  [92m[OKAY][0m..............
 [93m[NO][0m ....... --------------------------------------------------[92m[OKAY][0m

--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... DeepSpeed general environment info:1.8.1

torch cuda version ............... 11.1
torch install pathnvcc version  ....................................  11.2
deepspeed install path ........... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed info torch version...................  ....................0.4.2+bc17042, bc17042, big-science 
1.8.1deepspeed wheel compiled w.
 ......torch cuda version  torch 1.8, cuda 11.1...............
 11.1
DeepSpeed general environment info:
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
DeepSpeed general environment info:nvcc version ..................... 
11.2
deepspeed install path ........... torch install path['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
...............deepspeed info  ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']...... 
torch 1.8, cuda 11.1
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
torch version .................... 1.8.1
torch version .................... 1.8.1
torch cuda version ............... 11.1
async_io ............... [93m[NO][0m ....... [93m[NO][0m
torch cuda version ............... 11.1
nvcc version ..................... 11.2
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... DeepSpeed general environment info:0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. 
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
...... torch 1.8, cuda 11.1
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
DeepSpeed general environment info:DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
torch version .................... 1.8.1
--------------------------------------------------

torch cuda version ............... 11.1
torch install path torch install path...............  ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

nvcc version ..................... 11.2
torch version torch version....................  ....................1.8.1 
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
1.8.1
torch cuda version torch cuda version...............  ...............11.1 
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
11.1nvcc version 
.....................nvcc version  11.2.....................
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
 deepspeed install path11.2 
...........deepspeed install path  ...........['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
DeepSpeed general environment info:
torch install pathDeepSpeed general environment info: ............... 
...................deepspeed info  0.4.2+bc17042, bc17042, big-science...................
 deepspeed wheel compiled w.0.4.2+bc17042, bc17042, big-science 
...... deepspeed wheel compiled w.torch 1.8, cuda 11.1 
...... torch 1.8, cuda 11.1
torch install path['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
 ............... torch version .................... 1.8.1
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch cuda version
 ............... torch version11.1 
.................... nvcc version1.8.1 .....................
DeepSpeed general environment info:
 11.2torch cuda version
 ...............deepspeed install path  11.1...........
 nvcc version ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'].....................
 deepspeed info11.2 
...................deepspeed install path  0.4.2+bc17042, bc17042, big-science...........
 deepspeed wheel compiled w. ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']......
 deepspeed infotorch 1.8, cuda 11.1 
................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m[NO][0m ....... 
[92m[OKAY][0m
--------------------------------------------------
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils async_io..................  [92m[YES][0m...............  ......[93m[NO][0m  [92m[OKAY][0m.......
 [93m[NO][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****

nvcc versionnvcc version  .......................................... 11.2 
11.2
deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

/bin/sh: line 0: type: git: not found
torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... DeepSpeed general environment info:['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch version .................... torch install path1.8.1 
............... torch cuda version ............... 11.1
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']nvcc version
 ..................... torch version11.2 
....................deepspeed install path  1.8.1...........
 torch cuda version['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
...............deepspeed info  11.1...................
 nvcc version0.4.2+bc17042, bc17042, big-science 
..................... deepspeed wheel compiled w.11.2 
...... deepspeed install pathtorch 1.8, cuda 11.1 
........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:DeepSpeed general environment info:

torch install path ...............torch install path  ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version torch version....................  ....................1.8.1 
1.8.1
torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

/bin/sh: line 0: type: git: not found
deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
/bin/sh: line 0: type: git: not found
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
DeepSpeed general environment info:
torch cuda version ............... 11.1
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch cuda version ............... 11.1
torch version .................... 1.8.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
torch cuda version ............... 11.1
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed info deepspeed info...................  ...................0.4.2+bc17042, bc17042, big-science 
0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

DeepSpeed general environment info:DeepSpeed general environment info:

torch install path ...............torch install path  ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version ....................torch version  1.8.1....................
 1.8.1
torch cuda version torch cuda version...............  ............... 11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w. deepspeed wheel compiled w.......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
/bin/sh: line 0: type: git: not found
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed info deepspeed info...................  ...................0.4.2+bc17042, bc17042, big-science 
0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w.
 deepspeed wheel compiled w.......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda version ............... 11.1
torch cuda versionnvcc version  ....................................  11.111.2

nvcc versiondeepspeed install path  ................................  11.2
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed install path
 deepspeed info...........  ................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. 
......deepspeed info  torch 1.8, cuda 11.1...................
 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
DeepSpeed general environment info:
torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']....................
 1.8.1
torch version ....................torch cuda version  ...............1.8.1 
11.1
torch cuda versionnvcc version  ....................................  11.111.2

nvcc versiondeepspeed install path  ................................  11.2
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed install path
 deepspeed info...........  ................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']0.4.2+bc17042, bc17042, big-science

deepspeed infodeepspeed wheel compiled w.  .........................  0.4.2+bc17042, bc17042, big-sciencetorch 1.8, cuda 11.1

deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc versionDeepSpeed general environment info: ..................... 11.2

deepspeed install path ........... torch install path['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
/bin/sh: line 0: type: git: not found
...............deepspeed info  ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']......
 torch 1.8, cuda 11.1
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version torch version....................  ....................1.8.1 
1.8.1
torch cuda version torch cuda version...............  ...............11.1 
11.1
nvcc version nvcc version.....................  .....................11.2 
11.2
deepspeed install path deepspeed install path...........  ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
torch install pathtorch install path ...............  ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version torch version....................  ....................1.8.1 
1.8.1
torch cuda version torch cuda version...............  ...............11.1 
11.1nvcc version
 nvcc version.....................  11.2.....................
 11.2deepspeed install path
 deepspeed install path...........  ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info
 deepspeed info...................  ...................0.4.2+bc17042, bc17042, big-science 
0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w.
 deepspeed wheel compiled w.......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
DeepSpeed general environment info:
DeepSpeed general environment info:torch install path
 ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']....................
 1.8.1
torch version torch cuda version....................  ...............1.8.1 
11.1
torch cuda versionnvcc version  ....................................  11.111.2

nvcc versiondeepspeed install path  ................................  11.2
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed install path
 deepspeed info...........  ...................['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 0.4.2+bc17042, bc17042, big-science

deepspeed infodeepspeed wheel compiled w.  .........................  torch 1.8, cuda 11.10.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

DeepSpeed general environment info:
torch install path ............... DeepSpeed general environment info:
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch install pathtorch version  ...................................  1.8.1
torch cuda version ...............['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 
11.1
torch versionnvcc version  .........................................  1.8.111.2

deepspeed install path torch cuda version...........  ............... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']11.1

deepspeed infonvcc version  ........................................  0.4.2+bc17042, bc17042, big-science11.2

deepspeed wheel compiled w.deepspeed install path  .................  torch 1.8, cuda 11.1
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
DeepSpeed general environment info:torch cuda version ...............
 11.1
nvcc version torch install path.....................  11.2...............
 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']deepspeed info
 ................... torch version0.4.2+bc17042, bc17042, big-science 
....................deepspeed wheel compiled w.  1.8.1......
 torch 1.8, cuda 11.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

DeepSpeed general environment info:
torch install pathtorch install path  ..............................  torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch versiontorch version
  ........................................ torch version 1.8.1 1.8.1
....................
 torch cuda version1.8.1torch cuda version 
 .............................. torch cuda version 11.1 11.1
...............
nvcc version nvcc version 11.1 .....................
..................... nvcc version 11.2 11.2
.....................
deepspeed install path deepspeed install path 11.2 ...........
........... deepspeed install path  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']...........['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
 
deepspeed infodeepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']  
...................................... deepspeed info 0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-science
...................
 deepspeed wheel compiled w.deepspeed wheel compiled w. 0.4.2+bc17042, bc17042, big-science ......
 ......deepspeed wheel compiled w.torch 1.8, cuda 11.1  
torch 1.8, cuda 11.1......
 torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:DeepSpeed general environment info:

/bin/sh: line 0: type: git: not found
torch install path ...............torch install path  ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch version
 .................... torch version1.8.1 
.................... 1.8.1torch cuda version
 ............... torch cuda version11.1 
............... nvcc version11.1 
..................... nvcc version11.2 
.....................deepspeed install path  11.2...........
 deepspeed install path ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']...........
 deepspeed info ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']...................
 deepspeed info0.4.2+bc17042, bc17042, big-science 
...................deepspeed wheel compiled w.  0.4.2+bc17042, bc17042, big-science......
 deepspeed wheel compiled w.torch 1.8, cuda 11.1 
...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

--------------------------------------------------
torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w. ......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:DeepSpeed general environment info:DeepSpeed general environment info:


fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

torch install pathtorch install path  torch install path..............................   ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch versiontorch version  torch version........................................   ....................1.8.11.8.1 

1.8.1
torch cuda versiontorch cuda version torch cuda version ...............  ..............................11.1  
11.111.1nvcc version

sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
 nvcc versionnvcc version.....................   ..........................................11.2  
11.211.2deepspeed install path

 deepspeed install pathdeepspeed install path...........   ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info

 deepspeed infodeepspeed info...................   ...................0.4.2+bc17042, bc17042, big-science................... 
 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w.0.4.2+bc17042, bc17042, big-science
 
deepspeed wheel compiled w.......deepspeed wheel compiled w.   ......torch 1.8, cuda 11.1...... 
 torch 1.8, cuda 11.1torch 1.8, cuda 11.1

transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ...... ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
torch install path ............... DeepSpeed general environment info:
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch install path torch version...............  .................... 1.8.1
torch cuda version ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']...............
 11.1
torch versionnvcc version  .........................................  1.8.111.2

deepspeed install path torch cuda version...........  ............... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']11.1

deepspeed infonvcc version  ........................................  0.4.2+bc17042, bc17042, big-science11.2

deepspeed wheel compiled w.deepspeed install path  .................  torch 1.8, cuda 11.1
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
DeepSpeed general environment info:
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path ...............  ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch version
 .................... 1.8.1torch version
 ....................torch cuda version  1.8.1............... 
11.1
nvcc versiontorch cuda version .....................  ...............11.2 
11.1deepspeed install path 
...........nvcc version  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'].....................
 deepspeed info11.2 ...................
 0.4.2+bc17042, bc17042, big-sciencedeepspeed install path
 deepspeed wheel compiled w............ ......  torch 1.8, cuda 11.1
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
DeepSpeed general environment info:torch install path
 ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 
.................... 1.8.1torch version
 ....................torch cuda version  1.8.1...............
 11.1torch cuda version
/bin/sh: line 0: type: git: not found
 ...............nvcc version  11.1.....................
 11.2nvcc version
 deepspeed install path.....................  ...........11.2 
deepspeed install path['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
...........deepspeed info  ...................['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
0.4.2+bc17042, bc17042, big-sciencedeepspeed info
 deepspeed wheel compiled w....................  ......0.4.2+bc17042, bc17042, big-science 
torch 1.8, cuda 11.1
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path ...............  ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch version
 .................... torch version1.8.1 
.................... 1.8.1torch cuda version
 ............... torch cuda version11.1 
...............nvcc version  11.1.....................
 nvcc version11.2 
..................... deepspeed install path11.2 
...........deepspeed install path  ...........['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info
 ...................deepspeed info  0.4.2+bc17042, bc17042, big-science...................
 deepspeed wheel compiled w.0.4.2+bc17042, bc17042, big-science 
......deepspeed wheel compiled w.  torch 1.8, cuda 11.1......
 torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

/bin/sh: line 0: type: git: not found
deepspeed info deepspeed info...................  ................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch version .................... 1.8.1
torch cuda version ............... 11.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
DeepSpeed general environment info:deepspeed info ................... 
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ......torch install path torch 1.8, cuda 11.1
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
 ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:DeepSpeed general environment info:

utils .................. [92m[YES][0m ...... [92m[OKAY][0m
torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... [93m[NO][0masync_io  ......................  [93m[NO][0m[93m[NO][0m
 ....... [93m[NO][0m
transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utils .................. [92m[YES][0m ......utils  [92m[OKAY][0m..................
 [92m[YES][0m ...... quantizer[92m[OKAY][0m 
.............. [93m[NO][0m .......quantizer  [92m[OKAY][0m..............
--------------------------------------------------
 [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info DeepSpeed general environment info:................... 0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op report--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
--------------------------------------------------

DeepSpeed C++/CUDA extension op report
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
----------------------------------------------------------------------------------------------------
--------------------------------------------------


--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report


JIT compiled ops requires ninja--------------------------------------------------

--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja

--------------------------------------------------
JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
/bin/sh: line 0: type: git: not found
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
/bin/sh: line 0: type: git: not found
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install pathDeepSpeed general environment info: ............... 
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch install path
 ............... torch version .................... 1.8.1
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch cuda version
 ............... torch version11.1 
....................nvcc version  1.8.1.....................
 11.2torch cuda version
 deepspeed install path...............  ...........11.1 
nvcc version['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
.....................deepspeed info  11.2...................
 deepspeed install path0.4.2+bc17042, bc17042, big-science 
...........deepspeed wheel compiled w.  ......['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
torch 1.8, cuda 11.1
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
ninjaninjaninjaninja   .................. .................. ....................................  [92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m
[92m[OKAY][0m
--------------------------------------------------
----------------------------------------------------------------------------------------------------


--------------------------------------------------op nameop name
op name   ................................ op name ................installed  installed installed ..................  .... compatible  installed
compatiblecompatible --------------------------------------------------

..
---------------------------------------------------------------------------------------------------- 

compatible
--------------------------------------------------
cpu_adam ............... cpu_adam[92m[YES][0mcpu_adam  .....................cpu_adam    [92m[YES][0m..............................[92m[OKAY][0m  
 ......[92m[YES][0m  [92m[YES][0m......[92m[OKAY][0m  
......[92m[OKAY][0m 
[92m[OKAY][0mfused_adam
 ............. [93m[NO][0m ....... [92m[OKAY][0mfused_adam
fused_adam .............fused_lambfused_adam    .............[93m[NO][0m..........................   ....... [93m[NO][0m[93m[NO][0m  [93m[NO][0m .......[92m[OKAY][0m.......  
 .......[92m[OKAY][0m [92m[OKAY][0m

fused_lamb[92m[OKAY][0m 
/bin/sh: line 0: type: git: not found
.............fused_lamb fused_lamb [93m[NO][0m  .................................   [93m[NO][0m[93m[NO][0m[92m[OKAY][0m  
/bin/sh: line 0: type: git: not found
sparse_attn....... ....... ............ [92m[OKAY][0m [92m[OKAY][0m
[93m[NO][0m
 ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
sparse_attntransformer  ........................  [93m[NO][0m[93m[NO][0m  .............. sparse_attn sparse_attn[92m[OKAY][0m[92m[OKAY][0m 

 ........................  stochastic_transformer[93m[NO][0m transformer [93m[NO][0m ....... .............   .......[93m[NO][0m[92m[OKAY][0m[93m[NO][0m  
 [92m[OKAY][0m.............. 
 [92m[OKAY][0mtransformer[92m[OKAY][0m
 ............
transformer stochastic_transformer [93m[NO][0m  ....................   [93m[NO][0m[93m[NO][0m[92m[OKAY][0m  
.............. [92m[OKAY][0m 
stochastic_transformer[92m[OKAY][0m 
. [93m[NO][0m stochastic_transformer.......  [92m[OKAY][0m
. [93m[NO][0m ....... [92m[OKAY][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
using world size: 256, data-parallel-size: 8, tensor-model-parallel size: 4, pipeline-model-parallel size: 8 
using torch.float16 for parameters ...
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
------------------------ arguments ------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
  accumulate_allreduce_grads_in_fp32 .............. False
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
  adam_beta1 ...................................... 0.9
  adam_beta2 ...................................... 0.999
  adam_eps ........................................ 1e-08
  adlr_autoresume ................................. False
  adlr_autoresume_interval ........................ 1000
  apply_query_key_layer_scaling ................... True
  apply_residual_connection_post_layernorm ........ False
  attention_dropout ............................... 0.1
  attention_softmax_in_fp32 ....................... False
  bert_binary_head ................................ True
  bert_load ....................................... None
  bf16 ............................................ False
  bias_dropout_fusion ............................. True
  bias_gelu_fusion ................................ True
  biencoder_projection_dim ........................ 0
  biencoder_shared_query_context_model ............ False
  block_data_path ................................. None
  checkpoint_activations .......................... True
  checkpoint_in_cpu ............................... False
  checkpoint_num_layers ........................... 1
  clip_grad ....................................... 1.0
  codecarbon_dir .................................. /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/tr8-104B-logs/codecarbon
  consumed_train_samples .......................... 0
  consumed_valid_samples .......................... 0
  contigious_checkpointing ........................ False
  cpu_optimizer ................................... False
  cpu_torch_adam .................................. False
  data_impl ....................................... mmap
  data_parallel_size .............................. 8
  data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document']
  dataloader_type ................................. single
  DDP_impl ........................................ local
  decoder_seq_length .............................. None
  deepscale ....................................... False
  deepscale_config ................................ None
  deepspeed ....................................... True
  deepspeed_activation_checkpointing .............. True
  deepspeed_config ................................ ./ds_config.1185609.json
  deepspeed_mpi ................................... False
  distribute_checkpointed_activations ............. False
  distributed_backend ............................. nccl
  embedding_path .................................. None
  encoder_seq_length .............................. 2048
  eod_mask_loss ................................... False
  eval_interval ................................... 1000
  eval_iters ...................................... 5
  evidence_data_path .............................. None
  exit_duration_in_mins ........................... 110
  exit_interval ................................... None
  ffn_hidden_size ................................. 20480
  finetune ........................................ False
  fp16 ............................................ True
  fp16_lm_cross_entropy ........................... False
  fp32_residual_connection ........................ False
  global_batch_size ............................... 2048
  hidden_dropout .................................. 0.1
  hidden_size ..................................... 16384
  hysteresis ...................................... 2
  ict_head_size ................................... None
  ict_load ........................................ None
  img_dim ......................................... 224
  indexer_batch_size .............................. 128
  indexer_log_interval ............................ 1000
  init_method_std ................................. 0.02
  init_method_xavier_uniform ...................... False
  initial_loss_scale .............................. 4294967296
  kv_channels ..................................... 512
  layernorm_epsilon ............................... 1e-05
  lazy_mpu_init ................................... None
  load ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints
  local_rank ...................................... 0
  log_batch_size_to_tensorboard ................... True
  log_interval .................................... 10
  log_learning_rate_to_tensorboard ................ True
  log_loss_scale_to_tensorboard ................... True
  log_num_zeros_in_grad ........................... False
  log_params_norm ................................. False
  log_timers_to_tensorboard ....................... True
  log_validation_ppl_to_tensorboard ............... True
  loss_scale ...................................... 12.0
  loss_scale_window ............................... 1000
  lr .............................................. 6e-05
  lr_decay_iters .................................. None
  lr_decay_samples ................................ 126953125
  lr_decay_style .................................. cosine
  lr_warmup_fraction .............................. None
  lr_warmup_iters ................................. 0
  lr_warmup_samples ............................... 216320
  make_vocab_size_divisible_by .................... 128
  mask_prob ....................................... 0.15
  masked_softmax_fusion ........................... True
  max_position_embeddings ......................... 2048
  memory_centric_tiled_linear ..................... False
  merge_file ...................................... /gpfswork/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/data/gpt2-merges.txt
  micro_batch_size ................................ 1
  min_loss_scale .................................. 1.0
  min_lr .......................................... 6e-06
  mmap_warmup ..................................... False
  no_load_optim ................................... None
  no_load_rng ..................................... None
  no_save_optim ................................... None
  no_save_rng ..................................... None
  num_attention_heads ............................. 32
  num_channels .................................... 3
  num_classes ..................................... 1000
  num_layers ...................................... 32
  num_layers_per_virtual_pipeline_stage ........... None
  num_workers ..................................... 2
  onnx_safe ....................................... None
  openai_gelu ..................................... False
  optimizer ....................................... adam
  override_lr_scheduler ........................... False
  params_dtype .................................... torch.float16
  partition_activations ........................... False
  patch_dim ....................................... 16
  pipeline_model_parallel_size .................... 8
  position_embedding_type ......................... PositionEmbeddingType.absolute
  profile_backward ................................ False
  query_in_block_prob ............................. 0.1
  rampup_batch_size ............................... ['16', '16', '6_000_000']
  rank ............................................ 0
  remote_device ................................... none
  reset_attention_mask ............................ False
  reset_position_ids .............................. False
  retriever_report_topk_accuracies ................ []
  retriever_score_scaling ......................... False
  retriever_seq_length ............................ 256
  sample_rate ..................................... 1.0
  save ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints
  save_interval ................................... 1500
  scatter_gather_tensors_in_pipeline .............. True
  scattered_embeddings ............................ False
  seed ............................................ 42
  seq_length ...................................... 2048
  sgd_momentum .................................... 0.9
  short_seq_prob .................................. 0.1
  split ........................................... 949,50,1
  split_transformers .............................. False
  synchronize_each_layer .......................... False
  tensor_model_parallel_size ...................... 4
  tensorboard_dir ................................. /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/tr8-104B-logs/tensorboard
  tensorboard_log_interval ........................ 1
  tensorboard_queue_size .......................... 5
  tile_factor ..................................... 1
  titles_data_path ................................ None
  tokenizer_name_or_path .......................... None
  tokenizer_type .................................. GPT2BPETokenizer
  train_iters ..................................... None
  train_samples ................................... 300000000
  use_checkpoint_lr_scheduler ..................... False
  use_contiguous_buffers_in_ddp ................... False
  use_cpu_initialization .......................... None
  use_one_sent_docs ............................... False
  use_pin_memory .................................. False
  virtual_pipeline_model_parallel_size ............ None
  vocab_extra_ids ................................. 0
  vocab_file ...................................... /gpfswork/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/data/gpt2-vocab.json
  weight_decay .................................... 0.1
  world_size ...................................... 256
  zero_allgather_bucket_size ...................... 0.0
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
  zero_contigious_gradients ....................... False
  zero_reduce_bucket_size ......................... 0.0
  zero_reduce_scatter ............................. False
  zero_stage ...................................... 1
-------------------- end of arguments ---------------------
will use batch size rampup starting from global batch size 16 to global batch size 2048 with batch size increments 16 over 6000000 samples.
> building GPT2BPETokenizer tokenizer ...
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']DeepSpeed general environment info:
torch version
 .................... 1.8.1
torch install pathtorch cuda version  ..............................  11.1
nvcc version ..................... 11.2
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']deepspeed install path
 ........... torch version['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
.................... deepspeed info1.8.1 
................... 0.4.2+bc17042, bc17042, big-sciencetorch cuda version
 ...............deepspeed wheel compiled w.  11.1......
 torch 1.8, cuda 11.1nvcc version
 ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... DeepSpeed general environment info:['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. 
...... torch 1.8, cuda 11.1
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version DeepSpeed general environment info:.................... 1.8.1

torch cuda version ............... 11.1torch install path
 nvcc version...............  ..................... 11.2
deepspeed install path ...........['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
torch versiondeepspeed info  .......................................  1.8.10.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.torch cuda version  .....................  torch 1.8, cuda 11.111.1

nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inference ..transformer_inference  [93m[NO][0m..  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
utils .................. utils[92m[YES][0m  ........................  [92m[YES][0m[92m[OKAY][0m 
...... [92m[OKAY][0m
quantizer quantizer..............  ..............[93m[NO][0m  [93m[NO][0m ..............  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference ..utils  [93m[NO][0m..................  .......[92m[YES][0m  ......[92m[OKAY][0m 
[92m[OKAY][0m
utilsquantizer  ................................  [92m[YES][0m[93m[NO][0m  .............  [92m[OKAY][0m[92m[OKAY][0m

quantizer --------------------------------------------------..............
 [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

DeepSpeed general environment info:DeepSpeed general environment info:

torch install path torch install path...............  ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch version torch version....................  ....................1.8.1 
1.8.1
torch cuda version torch cuda version...............  ...............11.1 
11.1nvcc version
 nvcc version.....................  .....................11.2 
11.2deepspeed install path
 deepspeed install path...........  ...........['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info
 ...................deepspeed info  0.4.2+bc17042, bc17042, big-science...................
 deepspeed wheel compiled w.0.4.2+bc17042, bc17042, big-science 
......deepspeed wheel compiled w.  torch 1.8, cuda 11.1......
 torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
 > padded vocab (size: 50257) with 431 dummy tokens (new size: 50688)
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install path ...............torch install path  ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version ....................torch version  1.8.1....................
 1.8.1torch cuda version
 ............... torch cuda version11.1 
...............nvcc version  11.1.....................
 11.2nvcc version
 deepspeed install path.....................  ...........11.2 
deepspeed install path['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
........... deepspeed info ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']...................
 deepspeed info0.4.2+bc17042, bc17042, big-science 
...................deepspeed wheel compiled w.  0.4.2+bc17042, bc17042, big-science......
 deepspeed wheel compiled w.torch 1.8, cuda 11.1 
...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
> setting tensorboard ...
> setting codecarbon ...
--------------------------------------------------
DeepSpeed C++/CUDA extension op report--------------------------------------------------
--------------------------------------------------

DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

JIT compiled ops requires ninja


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report--------------------------------------------------


----------------------------------------------------------------------------------------------------JIT compiled ops requires ninja


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninjaJIT compiled ops requires ninja

ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------


op nameop name op name op name................................    ................installed................installed   installed installed .. ......   compatible compatiblecompatiblecompatible


------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------

cpu_adamcpu_adamcpu_adamcpu_adam    ............................................................    [92m[YES][0m[92m[YES][0m[92m[YES][0m  [92m[YES][0m ............ ......  ...... [92m[OKAY][0m[92m[OKAY][0m [92m[OKAY][0m

[92m[OKAY][0m

fused_adamfused_adam  fused_adamfused_adam..........................    .............[93m[NO][0m.............[93m[NO][0m   [93m[NO][0m....... [93m[NO][0m  ....... [92m[OKAY][0m....... ....... 
 [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


fused_lamb fused_lambfused_lambfused_lamb .............   ..........................[93m[NO][0m.............    [93m[NO][0m [93m[NO][0m..............[93m[NO][0m   [92m[OKAY][0m .......[92m[OKAY][0m
....... 
 [92m[OKAY][0m[92m[OKAY][0m

sparse_attnsparse_attn ............sparse_attn  sparse_attn  ............[93m[NO][0m........................    [93m[NO][0m[93m[NO][0m.......[93m[NO][0m   [92m[OKAY][0m ..............
.......   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m

transformer
 transformertransformer............transformer    ....................................[93m[NO][0m    [93m[NO][0m.......[93m[NO][0m[93m[NO][0m    ..............[92m[OKAY][0m  .......
 [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


stochastic_transformer stochastic_transformer.stochastic_transformerstochastic_transformer  [93m[NO][0m   ........  .[93m[NO][0m.[92m[OKAY][0m  
 .......[93m[NO][0m[93m[NO][0m   [92m[OKAY][0m..............
  [92m[OKAY][0m[92m[OKAY][0m

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inference transformer_inference..  ..[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

quantizer quantizer..............  ..............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inference .. [93m[NO][0m transformer_inference.......  ..[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0mutils  ........................  [92m[OKAY][0m[92m[YES][0m
 ...... quantizer[92m[OKAY][0m 
.............. [93m[NO][0m quantizer.......  ..............[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
> initializing torch distributed ...
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path ...............  ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version torch version....................  ....................1.8.1 
1.8.1
torch cuda version ...............torch cuda version  11.1...............
 nvcc version11.1 
.....................nvcc version  11.2.....................
 deepspeed install path11.2 
...........deepspeed install path  ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info
 ...................deepspeed info  0.4.2+bc17042, bc17042, big-science...................
 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w.
 ......deepspeed wheel compiled w.  torch 1.8, cuda 11.1......
 torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninjaJIT compiled ops requires ninja

ninjaninja  ....................................  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

op nameop name  ................................  installedinstalled  ....  compatiblecompatible

----------------------------------------------------------------------------------------------------

cpu_adam cpu_adam...............  ...............[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
fused_adamfused_adam  ..........................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

fused_lambfused_lamb  ..........................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

sparse_attnsparse_attn  ........................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

transformer transformer............  ............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
stochastic_transformerstochastic_transformer  ..  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m transformer_inference.......  ..[93m[NO][0m 
[93m[NO][0m ....... [92m[OKAY][0m
utils ..................transformer_inference  [92m[YES][0m..  ......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
quantizerutils  ................................  [93m[NO][0m[92m[YES][0m  .............  [92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------quantizer
 .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------DeepSpeed C++/CUDA extension op report--------------------------------------------------


----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
DeepSpeed C++/CUDA extension op report


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja--------------------------------------------------


----------------------------------------------------------------------------------------------------JIT compiled ops requires ninja


JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja    ...................................................... ..................  [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m

[92m[OKAY][0m--------------------------------------------------
--------------------------------------------------


--------------------------------------------------op name
 op name--------------------------------------------------................ op name
  ................installedop name................    installed..................installed    compatible....installed
   compatible..--------------------------------------------------compatible
 

compatible--------------------------------------------------
--------------------------------------------------

--------------------------------------------------
cpu_adam cpu_adam...............  ...............cpu_adam[92m[YES][0mcpu_adam    [92m[YES][0m..................... ...............  ......[92m[YES][0m [92m[OKAY][0m  
[92m[YES][0m......[92m[OKAY][0m  
[92m[OKAY][0m......
 [92m[OKAY][0m
fused_adam ............. [93m[NO][0m fused_adamfused_adam....... fused_adam .............[92m[OKAY][0m  
............. .............[93m[NO][0m fused_lamb   [93m[NO][0m.............[93m[NO][0m.......   ....... .......[93m[NO][0m [92m[OKAY][0m [92m[OKAY][0m 
.......
[92m[OKAY][0m 
[92m[OKAY][0mfused_lamb
 fused_lamb.............fused_lamb   .............[93m[NO][0m.............  [93m[NO][0m ....... [93m[NO][0m ....... [92m[OKAY][0m sparse_attn.......[92m[OKAY][0m
  
............[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0msparse_attn sparse_attn sparse_attn...................    [92m[OKAY][0m............[93m[NO][0m............
   [93m[NO][0m.......[93m[NO][0m  stochastic_transformer .......[92m[OKAY][0m .......
  .[92m[OKAY][0mtransformer
 [92m[OKAY][0m ............
transformer[93m[NO][0m  transformer [93m[NO][0m....... ...................    ............[92m[OKAY][0m[92m[OKAY][0m[93m[NO][0m

  [93m[NO][0m.......  .......[92m[OKAY][0mstochastic_transformer 
 [92m[OKAY][0m
. stochastic_transformer[93m[NO][0m  .......stochastic_transformer.   [92m[OKAY][0m[93m[NO][0m
 ........  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer [93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`............... [93m[NO][0m .......
 [92m[OKAY][0m
--------------------------------------------------
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

--------------------------------------------------
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

JIT compiled ops requires ninja--------------------------------------------------

JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0masync_io
 ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0mtransformer_inference
 .. [93m[NO][0m ....... [92m[OKAY][0mutils
 .................. [92m[YES][0m ...... [92m[OKAY][0mutils
 .................. quantizer[92m[YES][0m  ....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
quantizer --------------------------------------------------..............
 [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninjaninjaninjaninja    ......................................................  [92m[OKAY][0m.................. [92m[OKAY][0m [92m[OKAY][0m

[92m[OKAY][0m
----------------------------------------------------------------------------------------------------
--------------------------------------------------


--------------------------------------------------op nameop nameop name
  ................op name ................ ................ installed   installed................installed..    ....compatibleinstalled  
 compatiblecompatible--------------------------------------------------..


 ----------------------------------------------------------------------------------------------------
compatible

--------------------------------------------------
cpu_adam ...............cpu_adam  cpu_adam[92m[YES][0m...............   cpu_adam......[92m[YES][0m...............    .....................[92m[YES][0m[92m[OKAY][0m  
 [92m[OKAY][0m[92m[YES][0m......
  ......[92m[OKAY][0m 
[92m[OKAY][0m
fused_adam .............fused_adam  [93m[NO][0mfused_adam ....................fused_adam    [93m[NO][0m............. .............[92m[OKAY][0m 
....... [93m[NO][0m [93m[NO][0mfused_lamb[92m[OKAY][0m  
 ...........................   [92m[OKAY][0m[93m[NO][0m[92m[OKAY][0mfused_lamb
  
....................  fused_lamb[93m[NO][0m[92m[OKAY][0mfused_lamb  
 .................................  [92m[OKAY][0m [93m[NO][0m
[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

sparse_attn ............ [93m[NO][0m sparse_attn.......  ............[92m[OKAY][0m 
[93m[NO][0m sparse_attn.......sparse_attn transformer   .................................... [92m[OKAY][0m [93m[NO][0m[93m[NO][0m 
  [93m[NO][0m.............. transformer ....... [92m[OKAY][0m  
[92m[OKAY][0m[92m[OKAY][0m............

 transformer[93m[NO][0m transformer ............ .......stochastic_transformer ............  [93m[NO][0m [92m[OKAY][0m [93m[NO][0m.
.......   .......[93m[NO][0m[92m[OKAY][0m stochastic_transformer 
....... [92m[OKAY][0m [92m[OKAY][0m.

stochastic_transformer  [93m[NO][0m .......stochastic_transformer.   [92m[OKAY][0m[93m[NO][0m
 ........  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:DeepSpeed general environment info:

torch install path torch install path...............  ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inferenceutils  ....................  [93m[NO][0m[92m[YES][0m  .............  [92m[OKAY][0m[92m[OKAY][0m

utilsquantizer  ................................  [92m[YES][0m[93m[NO][0m  .............  [92m[OKAY][0m[92m[OKAY][0m

quantizer --------------------------------------------------..............
 [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------


DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op report
--------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja


--------------------------------------------------JIT compiled ops requires ninja

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja    ........................................................................   [92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m[92m[OKAY][0m
--------------------------------------------------


----------------------------------------------------------------------------------------------------op name--------------------------------------------------

 
................op nameop nameop name   installed ................................ ................ ..  installed installedinstalled compatible .. 
....  -------------------------------------------------- compatiblecompatible

compatible

----------------------------------------------------------------------------------------------------
--------------------------------------------------

cpu_adam ............... [92m[YES][0m ......cpu_adam cpu_adam [92m[OKAY][0mcpu_adam............... 
  ..............................[92m[YES][0m   [92m[YES][0m[92m[YES][0m......   ............[92m[OKAY][0mfused_adam  
 [92m[OKAY][0m[92m[OKAY][0m.............

 [93m[NO][0m ....... [92m[OKAY][0m
fused_adam fused_lamb.............fused_adam  fused_adam.............  [93m[NO][0m ............. [93m[NO][0m............. ....... [93m[NO][0m  [93m[NO][0m.......[92m[OKAY][0m  
 .......[92m[OKAY][0m....... 
 [92m[OKAY][0mfused_lamb[92m[OKAY][0m
 
............. [93m[NO][0mfused_lambfused_lamb  ....................   [92m[OKAY][0m.............[93m[NO][0m
sparse_attn   [93m[NO][0m...................   .......[92m[OKAY][0m[93m[NO][0m 
 [92m[OKAY][0m.......
 [92m[OKAY][0m
transformersparse_attn  ........................  [93m[NO][0m[93m[NO][0msparse_attn   sparse_attn..........................   [92m[OKAY][0m ............
[93m[NO][0m [92m[OKAY][0m [93m[NO][0m
transformer.......   ...................[92m[OKAY][0m  stochastic_transformer[93m[NO][0m
[92m[OKAY][0m  
.......transformer .transformer [92m[OKAY][0m  ............
............[93m[NO][0m   [93m[NO][0m[93m[NO][0m .......  ..............stochastic_transformer[92m[OKAY][0m   
[92m[OKAY][0m[92m[OKAY][0m
.
 [93m[NO][0m stochastic_transformer.......stochastic_transformer   [92m[OKAY][0m.
.  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path ...............  ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch version
 .................... torch version1.8.1 
.................... torch cuda version1.8.1 
............... torch cuda version11.1 
...............nvcc version  11.1.....................
 nvcc version11.2 
..................... deepspeed install path11.2 
........... deepspeed install path ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']........... 
deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
 ...................deepspeed info  0.4.2+bc17042, bc17042, big-science...................
 deepspeed wheel compiled w. 0.4.2+bc17042, bc17042, big-science......
 deepspeed wheel compiled w.torch 1.8, cuda 11.1 
...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
> initializing tensor model parallel with size 4
> initializing pipeline model parallel with size 8
> setting random seeds to 42 ...
[2021-09-25 02:35:45,964] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 2760 and data parallel seed: 42
> compiling dataset index builder ...
make: Entering directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/data'
make: Nothing to be done for 'default'.
make: Leaving directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/data'
>>> done with dataset index builder. Compilation time: 0.305 seconds
> compiling and loading fused kernels ...
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
Detected CUDA files, patching ldflags
Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/fused_kernels/build/build.ninja...
Building extension module scaled_upper_triang_masked_softmax_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module scaled_upper_triang_masked_softmax_cuda...
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
Detected CUDA files, patching ldflags
Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/fused_kernels/build/build.ninja...
Building extension module scaled_masked_softmax_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module scaled_masked_softmax_cuda...
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
Detected CUDA files, patching ldflags
Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/fused_kernels/build/build.ninja...
Building extension module fused_mix_prec_layer_norm_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module fused_mix_prec_layer_norm_cuda...
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
>>> done with compiling and loading fused kernels. Compilation time: 20.734 seconds
time to initialize megatron (seconds): -8.955
[after megatron is initialized] datetime: 2021-09-25 02:36:07 
building GPT model ...
[2021-09-25 02:36:07,098] [INFO] [utils.py:680:see_memory_usage] Before Building Model
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/cuda/memory.py:373: FutureWarning: torch.cuda.memory_cached has been renamed to torch.cuda.memory_reserved
  warnings.warn(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/cuda/memory.py:381: FutureWarning: torch.cuda.max_memory_cached has been renamed to torch.cuda.max_memory_reserved
  warnings.warn(
[2021-09-25 02:36:07,100] [INFO] [utils.py:681:see_memory_usage] MA 0.0 GB         Max_MA 0.0 GB         CA 0.0 GB         Max_CA 0 GB 
[2021-09-25 02:36:07,101] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory:  used = 36.67 GB, percent = 19.6%
SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None
Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=0, model=1): 1, ProcessCoord(pipe=0, data=0, model=2): 2, ProcessCoord(pipe=0, data=0, model=3): 3, ProcessCoord(pipe=0, data=1, model=0): 4, ProcessCoord(pipe=0, data=1, model=1): 5, ProcessCoord(pipe=0, data=1, model=2): 6, ProcessCoord(pipe=0, data=1, model=3): 7, ProcessCoord(pipe=0, data=2, model=0): 8, ProcessCoord(pipe=0, data=2, model=1): 9, ProcessCoord(pipe=0, data=2, model=2): 10, ProcessCoord(pipe=0, data=2, model=3): 11, ProcessCoord(pipe=0, data=3, model=0): 12, ProcessCoord(pipe=0, data=3, model=1): 13, ProcessCoord(pipe=0, data=3, model=2): 14, ProcessCoord(pipe=0, data=3, model=3): 15, ProcessCoord(pipe=0, data=4, model=0): 16, ProcessCoord(pipe=0, data=4, model=1): 17, ProcessCoord(pipe=0, data=4, model=2): 18, ProcessCoord(pipe=0, data=4, model=3): 19, ProcessCoord(pipe=0, data=5, model=0): 20, ProcessCoord(pipe=0, data=5, model=1): 21, ProcessCoord(pipe=0, data=5, model=2): 22, ProcessCoord(pipe=0, data=5, model=3): 23, ProcessCoord(pipe=0, data=6, model=0): 24, ProcessCoord(pipe=0, data=6, model=1): 25, ProcessCoord(pipe=0, data=6, model=2): 26, ProcessCoord(pipe=0, data=6, model=3): 27, ProcessCoord(pipe=0, data=7, model=0): 28, ProcessCoord(pipe=0, data=7, model=1): 29, ProcessCoord(pipe=0, data=7, model=2): 30, ProcessCoord(pipe=0, data=7, model=3): 31, ProcessCoord(pipe=1, data=0, model=0): 32, ProcessCoord(pipe=1, data=0, model=1): 33, ProcessCoord(pipe=1, data=0, model=2): 34, ProcessCoord(pipe=1, data=0, model=3): 35, ProcessCoord(pipe=1, data=1, model=0): 36, ProcessCoord(pipe=1, data=1, model=1): 37, ProcessCoord(pipe=1, data=1, model=2): 38, ProcessCoord(pipe=1, data=1, model=3): 39, ProcessCoord(pipe=1, data=2, model=0): 40, ProcessCoord(pipe=1, data=2, model=1): 41, ProcessCoord(pipe=1, data=2, model=2): 42, ProcessCoord(pipe=1, data=2, model=3): 43, ProcessCoord(pipe=1, data=3, model=0): 44, ProcessCoord(pipe=1, data=3, model=1): 45, ProcessCoord(pipe=1, data=3, model=2): 46, ProcessCoord(pipe=1, data=3, model=3): 47, ProcessCoord(pipe=1, data=4, model=0): 48, ProcessCoord(pipe=1, data=4, model=1): 49, ProcessCoord(pipe=1, data=4, model=2): 50, ProcessCoord(pipe=1, data=4, model=3): 51, ProcessCoord(pipe=1, data=5, model=0): 52, ProcessCoord(pipe=1, data=5, model=1): 53, ProcessCoord(pipe=1, data=5, model=2): 54, ProcessCoord(pipe=1, data=5, model=3): 55, ProcessCoord(pipe=1, data=6, model=0): 56, ProcessCoord(pipe=1, data=6, model=1): 57, ProcessCoord(pipe=1, data=6, model=2): 58, ProcessCoord(pipe=1, data=6, model=3): 59, ProcessCoord(pipe=1, data=7, model=0): 60, ProcessCoord(pipe=1, data=7, model=1): 61, ProcessCoord(pipe=1, data=7, model=2): 62, ProcessCoord(pipe=1, data=7, model=3): 63, ProcessCoord(pipe=2, data=0, model=0): 64, ProcessCoord(pipe=2, data=0, model=1): 65, ProcessCoord(pipe=2, data=0, model=2): 66, ProcessCoord(pipe=2, data=0, model=3): 67, ProcessCoord(pipe=2, data=1, model=0): 68, ProcessCoord(pipe=2, data=1, model=1): 69, ProcessCoord(pipe=2, data=1, model=2): 70, ProcessCoord(pipe=2, data=1, model=3): 71, ProcessCoord(pipe=2, data=2, model=0): 72, ProcessCoord(pipe=2, data=2, model=1): 73, ProcessCoord(pipe=2, data=2, model=2): 74, ProcessCoord(pipe=2, data=2, model=3): 75, ProcessCoord(pipe=2, data=3, model=0): 76, ProcessCoord(pipe=2, data=3, model=1): 77, ProcessCoord(pipe=2, data=3, model=2): 78, ProcessCoord(pipe=2, data=3, model=3): 79, ProcessCoord(pipe=2, data=4, model=0): 80, ProcessCoord(pipe=2, data=4, model=1): 81, ProcessCoord(pipe=2, data=4, model=2): 82, ProcessCoord(pipe=2, data=4, model=3): 83, ProcessCoord(pipe=2, data=5, model=0): 84, ProcessCoord(pipe=2, data=5, model=1): 85, ProcessCoord(pipe=2, data=5, model=2): 86, ProcessCoord(pipe=2, data=5, model=3): 87, ProcessCoord(pipe=2, data=6, model=0): 88, ProcessCoord(pipe=2, data=6, model=1): 89, ProcessCoord(pipe=2, data=6, model=2): 90, ProcessCoord(pipe=2, data=6, model=3): 91, ProcessCoord(pipe=2, data=7, model=0): 92, ProcessCoord(pipe=2, data=7, model=1): 93, ProcessCoord(pipe=2, data=7, model=2): 94, ProcessCoord(pipe=2, data=7, model=3): 95, ProcessCoord(pipe=3, data=0, model=0): 96, ProcessCoord(pipe=3, data=0, model=1): 97, ProcessCoord(pipe=3, data=0, model=2): 98, ProcessCoord(pipe=3, data=0, model=3): 99, ProcessCoord(pipe=3, data=1, model=0): 100, ProcessCoord(pipe=3, data=1, model=1): 101, ProcessCoord(pipe=3, data=1, model=2): 102, ProcessCoord(pipe=3, data=1, model=3): 103, ProcessCoord(pipe=3, data=2, model=0): 104, ProcessCoord(pipe=3, data=2, model=1): 105, ProcessCoord(pipe=3, data=2, model=2): 106, ProcessCoord(pipe=3, data=2, model=3): 107, ProcessCoord(pipe=3, data=3, model=0): 108, ProcessCoord(pipe=3, data=3, model=1): 109, ProcessCoord(pipe=3, data=3, model=2): 110, ProcessCoord(pipe=3, data=3, model=3): 111, ProcessCoord(pipe=3, data=4, model=0): 112, ProcessCoord(pipe=3, data=4, model=1): 113, ProcessCoord(pipe=3, data=4, model=2): 114, ProcessCoord(pipe=3, data=4, model=3): 115, ProcessCoord(pipe=3, data=5, model=0): 116, ProcessCoord(pipe=3, data=5, model=1): 117, ProcessCoord(pipe=3, data=5, model=2): 118, ProcessCoord(pipe=3, data=5, model=3): 119, ProcessCoord(pipe=3, data=6, model=0): 120, ProcessCoord(pipe=3, data=6, model=1): 121, ProcessCoord(pipe=3, data=6, model=2): 122, ProcessCoord(pipe=3, data=6, model=3): 123, ProcessCoord(pipe=3, data=7, model=0): 124, ProcessCoord(pipe=3, data=7, model=1): 125, ProcessCoord(pipe=3, data=7, model=2): 126, ProcessCoord(pipe=3, data=7, model=3): 127, ProcessCoord(pipe=4, data=0, model=0): 128, ProcessCoord(pipe=4, data=0, model=1): 129, ProcessCoord(pipe=4, data=0, model=2): 130, ProcessCoord(pipe=4, data=0, model=3): 131, ProcessCoord(pipe=4, data=1, model=0): 132, ProcessCoord(pipe=4, data=1, model=1): 133, ProcessCoord(pipe=4, data=1, model=2): 134, ProcessCoord(pipe=4, data=1, model=3): 135, ProcessCoord(pipe=4, data=2, model=0): 136, ProcessCoord(pipe=4, data=2, model=1): 137, ProcessCoord(pipe=4, data=2, model=2): 138, ProcessCoord(pipe=4, data=2, model=3): 139, ProcessCoord(pipe=4, data=3, model=0): 140, ProcessCoord(pipe=4, data=3, model=1): 141, ProcessCoord(pipe=4, data=3, model=2): 142, ProcessCoord(pipe=4, data=3, model=3): 143, ProcessCoord(pipe=4, data=4, model=0): 144, ProcessCoord(pipe=4, data=4, model=1): 145, ProcessCoord(pipe=4, data=4, model=2): 146, ProcessCoord(pipe=4, data=4, model=3): 147, ProcessCoord(pipe=4, data=5, model=0): 148, ProcessCoord(pipe=4, data=5, model=1): 149, ProcessCoord(pipe=4, data=5, model=2): 150, ProcessCoord(pipe=4, data=5, model=3): 151, ProcessCoord(pipe=4, data=6, model=0): 152, ProcessCoord(pipe=4, data=6, model=1): 153, ProcessCoord(pipe=4, data=6, model=2): 154, ProcessCoord(pipe=4, data=6, model=3): 155, ProcessCoord(pipe=4, data=7, model=0): 156, ProcessCoord(pipe=4, data=7, model=1): 157, ProcessCoord(pipe=4, data=7, model=2): 158, ProcessCoord(pipe=4, data=7, model=3): 159, ProcessCoord(pipe=5, data=0, model=0): 160, ProcessCoord(pipe=5, data=0, model=1): 161, ProcessCoord(pipe=5, data=0, model=2): 162, ProcessCoord(pipe=5, data=0, model=3): 163, ProcessCoord(pipe=5, data=1, model=0): 164, ProcessCoord(pipe=5, data=1, model=1): 165, ProcessCoord(pipe=5, data=1, model=2): 166, ProcessCoord(pipe=5, data=1, model=3): 167, ProcessCoord(pipe=5, data=2, model=0): 168, ProcessCoord(pipe=5, data=2, model=1): 169, ProcessCoord(pipe=5, data=2, model=2): 170, ProcessCoord(pipe=5, data=2, model=3): 171, ProcessCoord(pipe=5, data=3, model=0): 172, ProcessCoord(pipe=5, data=3, model=1): 173, ProcessCoord(pipe=5, data=3, model=2): 174, ProcessCoord(pipe=5, data=3, model=3): 175, ProcessCoord(pipe=5, data=4, model=0): 176, ProcessCoord(pipe=5, data=4, model=1): 177, ProcessCoord(pipe=5, data=4, model=2): 178, ProcessCoord(pipe=5, data=4, model=3): 179, ProcessCoord(pipe=5, data=5, model=0): 180, ProcessCoord(pipe=5, data=5, model=1): 181, ProcessCoord(pipe=5, data=5, model=2): 182, ProcessCoord(pipe=5, data=5, model=3): 183, ProcessCoord(pipe=5, data=6, model=0): 184, ProcessCoord(pipe=5, data=6, model=1): 185, ProcessCoord(pipe=5, data=6, model=2): 186, ProcessCoord(pipe=5, data=6, model=3): 187, ProcessCoord(pipe=5, data=7, model=0): 188, ProcessCoord(pipe=5, data=7, model=1): 189, ProcessCoord(pipe=5, data=7, model=2): 190, ProcessCoord(pipe=5, data=7, model=3): 191, ProcessCoord(pipe=6, data=0, model=0): 192, ProcessCoord(pipe=6, data=0, model=1): 193, ProcessCoord(pipe=6, data=0, model=2): 194, ProcessCoord(pipe=6, data=0, model=3): 195, ProcessCoord(pipe=6, data=1, model=0): 196, ProcessCoord(pipe=6, data=1, model=1): 197, ProcessCoord(pipe=6, data=1, model=2): 198, ProcessCoord(pipe=6, data=1, model=3): 199, ProcessCoord(pipe=6, data=2, model=0): 200, ProcessCoord(pipe=6, data=2, model=1): 201, ProcessCoord(pipe=6, data=2, model=2): 202, ProcessCoord(pipe=6, data=2, model=3): 203, ProcessCoord(pipe=6, data=3, model=0): 204, ProcessCoord(pipe=6, data=3, model=1): 205, ProcessCoord(pipe=6, data=3, model=2): 206, ProcessCoord(pipe=6, data=3, model=3): 207, ProcessCoord(pipe=6, data=4, model=0): 208, ProcessCoord(pipe=6, data=4, model=1): 209, ProcessCoord(pipe=6, data=4, model=2): 210, ProcessCoord(pipe=6, data=4, model=3): 211, ProcessCoord(pipe=6, data=5, model=0): 212, ProcessCoord(pipe=6, data=5, model=1): 213, ProcessCoord(pipe=6, data=5, model=2): 214, ProcessCoord(pipe=6, data=5, model=3): 215, ProcessCoord(pipe=6, data=6, model=0): 216, ProcessCoord(pipe=6, data=6, model=1): 217, ProcessCoord(pipe=6, data=6, model=2): 218, ProcessCoord(pipe=6, data=6, model=3): 219, ProcessCoord(pipe=6, data=7, model=0): 220, ProcessCoord(pipe=6, data=7, model=1): 221, ProcessCoord(pipe=6, data=7, model=2): 222, ProcessCoord(pipe=6, data=7, model=3): 223, ProcessCoord(pipe=7, data=0, model=0): 224, ProcessCoord(pipe=7, data=0, model=1): 225, ProcessCoord(pipe=7, data=0, model=2): 226, ProcessCoord(pipe=7, data=0, model=3): 227, ProcessCoord(pipe=7, data=1, model=0): 228, ProcessCoord(pipe=7, data=1, model=1): 229, ProcessCoord(pipe=7, data=1, model=2): 230, ProcessCoord(pipe=7, data=1, model=3): 231, ProcessCoord(pipe=7, data=2, model=0): 232, ProcessCoord(pipe=7, data=2, model=1): 233, ProcessCoord(pipe=7, data=2, model=2): 234, ProcessCoord(pipe=7, data=2, model=3): 235, ProcessCoord(pipe=7, data=3, model=0): 236, ProcessCoord(pipe=7, data=3, model=1): 237, ProcessCoord(pipe=7, data=3, model=2): 238, ProcessCoord(pipe=7, data=3, model=3): 239, ProcessCoord(pipe=7, data=4, model=0): 240, ProcessCoord(pipe=7, data=4, model=1): 241, ProcessCoord(pipe=7, data=4, model=2): 242, ProcessCoord(pipe=7, data=4, model=3): 243, ProcessCoord(pipe=7, data=5, model=0): 244, ProcessCoord(pipe=7, data=5, model=1): 245, ProcessCoord(pipe=7, data=5, model=2): 246, ProcessCoord(pipe=7, data=5, model=3): 247, ProcessCoord(pipe=7, data=6, model=0): 248, ProcessCoord(pipe=7, data=6, model=1): 249, ProcessCoord(pipe=7, data=6, model=2): 250, ProcessCoord(pipe=7, data=6, model=3): 251, ProcessCoord(pipe=7, data=7, model=0): 252, ProcessCoord(pipe=7, data=7, model=1): 253, ProcessCoord(pipe=7, data=7, model=2): 254, ProcessCoord(pipe=7, data=7, model=3): 255}
[2021-09-25 02:36:08,503] [INFO] [module.py:360:_partition_layers] Partitioning pipeline stages with method type:transformer
stage=0 layers=7
     0: _to_float16
     1: EmbeddingPipe
     2: <lambda>
     3: ParallelTransformerLayerPipe
     4: ParallelTransformerLayerPipe
     5: ParallelTransformerLayerPipe
     6: ParallelTransformerLayerPipe
stage=1 layers=4
     7: ParallelTransformerLayerPipe
     8: ParallelTransformerLayerPipe
     9: ParallelTransformerLayerPipe
    10: ParallelTransformerLayerPipe
stage=2 layers=4
    11: ParallelTransformerLayerPipe
    12: ParallelTransformerLayerPipe
    13: ParallelTransformerLayerPipe
    14: ParallelTransformerLayerPipe
stage=3 layers=4
    15: ParallelTransformerLayerPipe
    16: ParallelTransformerLayerPipe
    17: ParallelTransformerLayerPipe
    18: ParallelTransformerLayerPipe
stage=4 layers=4
    19: ParallelTransformerLayerPipe
    20: ParallelTransformerLayerPipe
    21: ParallelTransformerLayerPipe
    22: ParallelTransformerLayerPipe
stage=5 layers=4
    23: ParallelTransformerLayerPipe
    24: ParallelTransformerLayerPipe
    25: ParallelTransformerLayerPipe
    26: ParallelTransformerLayerPipe
stage=6 layers=4
    27: ParallelTransformerLayerPipe
    28: ParallelTransformerLayerPipe
    29: ParallelTransformerLayerPipe
    30: ParallelTransformerLayerPipe
stage=7 layers=8
    31: ParallelTransformerLayerPipe
    32: ParallelTransformerLayerPipe
    33: ParallelTransformerLayerPipe
    34: ParallelTransformerLayerPipe
    35: <lambda>
    36: MixedFusedLayerNorm
    37: EmbeddingPipe
    38: float16_to_fp32
  loss: CrossEntropy
 > number of parameters on (tensor, pipeline) model parallel rank (2, 2): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (3, 1): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (2, 1): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (0, 1): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (2, 5): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (1, 5): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (0, 5): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (3, 5): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (2, 4): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (0, 4): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (1, 4): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (3, 4): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (3, 6): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (3, 3): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (2, 6): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (0, 6): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (1, 6): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (3, 2): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (2, 3): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (1, 2): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (0, 2): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (1, 1): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (1, 3): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (0, 3): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (1, 7): 1986498560
 > number of parameters on (tensor, pipeline) model parallel rank (1, 0): 1986465792
 > number of parameters on (tensor, pipeline) model parallel rank (2, 0): 1986465792
 > number of parameters on (tensor, pipeline) model parallel rank (2, 7): 1986498560
 > number of parameters on (tensor, pipeline) model parallel rank (0, 7): 1986498560
 > number of parameters on (tensor, pipeline) model parallel rank (3, 0): 1986465792
 > number of parameters on (tensor, pipeline) model parallel rank (3, 7): 1986498560
[2021-09-25 02:36:09,735] [INFO] [utils.py:680:see_memory_usage] After Building Model
[2021-09-25 02:36:09,736] [INFO] [utils.py:681:see_memory_usage] MA 3.77 GB         Max_MA 3.79 GB         CA 3.79 GB         Max_CA 4 GB 
[2021-09-25 02:36:09,737] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory:  used = 36.85 GB, percent = 19.7%
 > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 1986465792
setting training iterations to 159576
> learning rate decay style: cosine
DeepSpeed is enabled.
[2021-09-25 02:36:09,793] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.4.2+bc17042, git-hash=bc17042, git-branch=big-science
[2021-09-25 02:36:09,889] [INFO] [engine.py:179:__init__] DeepSpeed Flops Profiler Enabled: False
[2021-09-25 02:36:09,889] [INFO] [engine.py:736:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer
[2021-09-25 02:36:09,890] [INFO] [engine.py:741:_configure_optimizer] Using client Optimizer as basic optimizer
[2021-09-25 02:36:09,890] [INFO] [engine.py:750:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam
[2021-09-25 02:36:09,890] [INFO] [utils.py:43:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type=<class 'apex.optimizers.fused_adam.FusedAdam'>
[2021-09-25 02:36:09,890] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer
[2021-09-25 02:36:09,890] [INFO] [stage2.py:106:__init__] Reduce bucket size 500000000
[2021-09-25 02:36:09,890] [INFO] [stage2.py:107:__init__] Allgather bucket size 500000000
[2021-09-25 02:36:09,890] [INFO] [stage2.py:108:__init__] CPU Offload: False
[2021-09-25 02:36:09,890] [INFO] [stage2.py:109:__init__] Round robin gradient partitioning: False
[2021-09-25 02:36:14,495] [INFO] [stage2.py:419:__init__] optimizer state initialized
[2021-09-25 02:36:14,495] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam
[2021-09-25 02:36:14,495] [INFO] [engine.py:553:_configure_lr_scheduler] DeepSpeed using client LR scheduler
[2021-09-25 02:36:14,495] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = <megatron.learning_rates.AnnealingLR object at 0x1461f9edac40>
[2021-09-25 02:36:14,495] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)]
[2021-09-25 02:36:14,495] [INFO] [config.py:900:print] DeepSpeedEngine configuration:
[2021-09-25 02:36:14,495] [INFO] [config.py:904:print]   activation_checkpointing_config  {
    "partition_activations": false, 
    "contiguous_memory_optimization": false, 
    "cpu_checkpointing": false, 
    "number_checkpoints": null, 
    "synchronize_checkpoint_boundary": false, 
    "profile": false
}
[2021-09-25 02:36:14,495] [INFO] [config.py:904:print]   aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}
[2021-09-25 02:36:14,495] [INFO] [config.py:904:print]   allreduce_always_fp32 ........ False
[2021-09-25 02:36:14,495] [INFO] [config.py:904:print]   amp_enabled .................. False
[2021-09-25 02:36:14,496] [INFO] [config.py:904:print]   amp_params ................... False
[2021-09-25 02:36:14,496] [INFO] [config.py:904:print]   checkpoint_tag_validation_enabled  True
[2021-09-25 02:36:14,496] [INFO] [config.py:904:print]   checkpoint_tag_validation_fail  False
[2021-09-25 02:36:14,496] [INFO] [config.py:904:print]   disable_allgather ............ False
[2021-09-25 02:36:14,496] [INFO] [config.py:904:print]   dump_state ................... False
[2021-09-25 02:36:14,496] [INFO] [config.py:904:print]   dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1}
[2021-09-25 02:36:14,496] [INFO] [config.py:904:print]   eigenvalue_enabled ........... False
[2021-09-25 02:36:14,496] [INFO] [config.py:904:print]   eigenvalue_gas_boundary_resolution  1
[2021-09-25 02:36:14,496] [INFO] [config.py:904:print]   eigenvalue_layer_name ........ bert.encoder.layer
[2021-09-25 02:36:14,496] [INFO] [config.py:904:print]   eigenvalue_layer_num ......... 0
[2021-09-25 02:36:14,496] [INFO] [config.py:904:print]   eigenvalue_max_iter .......... 100
[2021-09-25 02:36:14,496] [INFO] [config.py:904:print]   eigenvalue_stability ......... 1e-06
[2021-09-25 02:36:14,496] [INFO] [config.py:904:print]   eigenvalue_tol ............... 0.01
[2021-09-25 02:36:14,496] [INFO] [config.py:904:print]   eigenvalue_verbose ........... False
[2021-09-25 02:36:14,496] [INFO] [config.py:904:print]   elasticity_enabled ........... False
[2021-09-25 02:36:14,496] [INFO] [config.py:904:print]   flops_profiler_config ........ {
    "enabled": false, 
    "profile_step": 1, 
    "module_depth": -1, 
    "top_modules": 1, 
    "detailed": true, 
    "output_file": null
}
[2021-09-25 02:36:14,496] [INFO] [config.py:904:print]   fp16_enabled ................. True
[2021-09-25 02:36:14,496] [INFO] [config.py:904:print]   fp16_mixed_quantize .......... False
[2021-09-25 02:36:14,496] [INFO] [config.py:904:print]   global_rank .................. 0
[2021-09-25 02:36:14,496] [INFO] [config.py:904:print]   gradient_accumulation_steps .. 256
[2021-09-25 02:36:14,496] [INFO] [config.py:904:print]   gradient_clipping ............ 1.0
[2021-09-25 02:36:14,496] [INFO] [config.py:904:print]   gradient_predivide_factor .... 1.0
[2021-09-25 02:36:14,496] [INFO] [config.py:904:print]   initial_dynamic_scale ........ 4096
[2021-09-25 02:36:14,496] [INFO] [config.py:904:print]   loss_scale ................... 0
[2021-09-25 02:36:14,496] [INFO] [config.py:904:print]   memory_breakdown ............. False
[2021-09-25 02:36:14,496] [INFO] [config.py:904:print]   optimizer_legacy_fusion ...... False
[2021-09-25 02:36:14,496] [INFO] [config.py:904:print]   optimizer_name ............... None
[2021-09-25 02:36:14,496] [INFO] [config.py:904:print]   optimizer_params ............. None
[2021-09-25 02:36:14,496] [INFO] [config.py:904:print]   pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0}
[2021-09-25 02:36:14,496] [INFO] [config.py:904:print]   pld_enabled .................. False
[2021-09-25 02:36:14,496] [INFO] [config.py:904:print]   pld_params ................... False
[2021-09-25 02:36:14,496] [INFO] [config.py:904:print]   prescale_gradients ........... False
[2021-09-25 02:36:14,496] [INFO] [config.py:904:print]   quantize_change_rate ......... 0.001
[2021-09-25 02:36:14,496] [INFO] [config.py:904:print]   quantize_groups .............. 1
[2021-09-25 02:36:14,496] [INFO] [config.py:904:print]   quantize_offset .............. 1000
[2021-09-25 02:36:14,496] [INFO] [config.py:904:print]   quantize_period .............. 1000
[2021-09-25 02:36:14,496] [INFO] [config.py:904:print]   quantize_rounding ............ 0
[2021-09-25 02:36:14,497] [INFO] [config.py:904:print]   quantize_start_bits .......... 16
[2021-09-25 02:36:14,497] [INFO] [config.py:904:print]   quantize_target_bits ......... 8
[2021-09-25 02:36:14,497] [INFO] [config.py:904:print]   quantize_training_enabled .... False
[2021-09-25 02:36:14,497] [INFO] [config.py:904:print]   quantize_type ................ 0
[2021-09-25 02:36:14,497] [INFO] [config.py:904:print]   quantize_verbose ............. False
[2021-09-25 02:36:14,497] [INFO] [config.py:904:print]   scheduler_name ............... None
[2021-09-25 02:36:14,497] [INFO] [config.py:904:print]   scheduler_params ............. None
[2021-09-25 02:36:14,497] [INFO] [config.py:904:print]   sparse_attention ............. None
[2021-09-25 02:36:14,497] [INFO] [config.py:904:print]   sparse_gradients_enabled ..... False
[2021-09-25 02:36:14,497] [INFO] [config.py:904:print]   steps_per_print .............. 2000
[2021-09-25 02:36:14,497] [INFO] [config.py:904:print]   tensorboard_enabled .......... False
[2021-09-25 02:36:14,497] [INFO] [config.py:904:print]   tensorboard_job_name ......... DeepSpeedJobName
[2021-09-25 02:36:14,497] [INFO] [config.py:904:print]   tensorboard_output_path ...... 
[2021-09-25 02:36:14,497] [INFO] [config.py:904:print]   train_batch_size ............. 2048
[2021-09-25 02:36:14,497] [INFO] [config.py:904:print]   train_micro_batch_size_per_gpu  1
[2021-09-25 02:36:14,497] [INFO] [config.py:904:print]   use_quantizer_kernel ......... False
[2021-09-25 02:36:14,497] [INFO] [config.py:904:print]   wall_clock_breakdown ......... False
[2021-09-25 02:36:14,497] [INFO] [config.py:904:print]   world_size ................... 8
[2021-09-25 02:36:14,497] [INFO] [config.py:904:print]   zero_allow_untested_optimizer  False
[2021-09-25 02:36:14,497] [INFO] [config.py:904:print]   zero_config .................. {
    "stage": 1, 
    "contiguous_gradients": false, 
    "reduce_scatter": true, 
    "reduce_bucket_size": 5.000000e+08, 
    "allgather_partitions": true, 
    "allgather_bucket_size": 5.000000e+08, 
    "overlap_comm": false, 
    "load_from_fp32_weights": true, 
    "elastic_checkpoint": true, 
    "offload_param": null, 
    "offload_optimizer": null, 
    "sub_group_size": 1.000000e+09, 
    "prefetch_bucket_size": 5.000000e+07, 
    "param_persistence_threshold": 1.000000e+05, 
    "max_live_parameters": 1.000000e+09, 
    "max_reuse_distance": 1.000000e+09, 
    "gather_fp16_weights_on_model_save": false, 
    "ignore_unused_parameters": true, 
    "round_robin_gradients": false, 
    "legacy_stage1": false
}
[2021-09-25 02:36:14,497] [INFO] [config.py:904:print]   zero_enabled ................. True
[2021-09-25 02:36:14,497] [INFO] [config.py:904:print]   zero_optimization_stage ...... 1
[2021-09-25 02:36:14,497] [INFO] [config.py:906:print]   json = {
    "train_micro_batch_size_per_gpu": 1, 
    "train_batch_size": 2.048000e+03, 
    "gradient_clipping": 1.0, 
    "zero_optimization": {
        "stage": 1
    }, 
    "fp16": {
        "enabled": true, 
        "loss_scale": 0, 
        "loss_scale_window": 500, 
        "hysteresis": 2, 
        "min_loss_scale": 1, 
        "initial_scale_power": 12
    }, 
    "steps_per_print": 2.000000e+03, 
    "wall_clock_breakdown": false
}
[2021-09-25 02:36:14,497] [INFO] [engine.py:76:__init__] CONFIG: micro_batches=256 micro_batch_size=1
[2021-09-25 02:36:15,038] [INFO] [engine.py:134:__init__] RANK=0 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-25 02:36:15,038] [INFO] [engine.py:134:__init__] RANK=1 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-25 02:36:15,038] [INFO] [engine.py:134:__init__] RANK=2 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-25 02:36:15,038] [INFO] [engine.py:134:__init__] RANK=3 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=67 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=64 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=65 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=66 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=193 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=194 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=195 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=129 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=130 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=128 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=227 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=224 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=225 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=226 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=98 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=160 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=163 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=161 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=162 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=192 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=131 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=34 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=32 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=35 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=33 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=97 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=99 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=96 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
 > using checkpoint value 6e-05 for learning rate
 > using checkpoint value 6e-06 for minimum learning rate
 > using checkpoint value 216320 for warmup iterations
 > using checkpoint value 126953125 for total number of iterations
 > using checkpoint value cosine for decay style
successfully loaded 8 ZeRO state_dicts for rank 180
successfully loaded 8 ZeRO state_dicts for rank 108
successfully loaded 8 ZeRO state_dicts for rank 206
successfully loaded 8 ZeRO state_dicts for rank 168
successfully loaded 8 ZeRO state_dicts for rank 167
successfully loaded 8 ZeRO state_dicts for rank 183
successfully loaded 8 ZeRO state_dicts for rank 112
successfully loaded 8 ZeRO state_dicts for rank 60
successfully loaded 8 ZeRO state_dicts for rank 56
successfully loaded 8 ZeRO state_dicts for rank 63
successfully loaded 8 ZeRO state_dicts for rank 222
successfully loaded 8 ZeRO state_dicts for rank 52
successfully loaded 8 ZeRO state_dicts for rank 177
successfully loaded 8 ZeRO state_dicts for rank 104
successfully loaded 8 ZeRO state_dicts for rank 164
successfully loaded 8 ZeRO state_dicts for rank 176
successfully loaded 8 ZeRO state_dicts for rank 110
successfully loaded 8 ZeRO state_dicts for rank 58
successfully loaded 8 ZeRO state_dicts for rank 178
successfully loaded 8 ZeRO state_dicts for rank 184
successfully loaded 8 ZeRO state_dicts for rank 116
successfully loaded 8 ZeRO state_dicts for rank 127
successfully loaded 8 ZeRO state_dicts for rank 96
successfully loaded 8 ZeRO state_dicts for rank 172
successfully loaded 8 ZeRO state_dicts for rank 188
successfully loaded 8 ZeRO state_dicts for rank 61
successfully loaded 8 ZeRO state_dicts for rank 182
successfully loaded 8 ZeRO state_dicts for rank 204
successfully loaded 8 ZeRO state_dicts for rank 62
successfully loaded 8 ZeRO state_dicts for rank 170
successfully loaded 8 ZeRO state_dicts for rank 124
successfully loaded 8 ZeRO state_dicts for rank 109
successfully loaded 8 ZeRO state_dicts for rank 44
successfully loaded 8 ZeRO state_dicts for rank 166
successfully loaded 8 ZeRO state_dicts for rank 59
successfully loaded 8 ZeRO state_dicts for rank 113
successfully loaded 8 ZeRO state_dicts for rank 200
successfully loaded 8 ZeRO state_dicts for rank 185
successfully loaded 8 ZeRO state_dicts for rank 15
successfully loaded 8 ZeRO state_dicts for rank 214
successfully loaded 8 ZeRO state_dicts for rank 143
successfully loaded 8 ZeRO state_dicts for rank 171
successfully loaded 8 ZeRO state_dicts for rank 169
successfully loaded 8 ZeRO state_dicts for rank 20
successfully loaded 8 ZeRO state_dicts for rank 198
successfully loaded 8 ZeRO state_dicts for rank 161
successfully loaded 8 ZeRO state_dicts for rank 57
successfully loaded 8 ZeRO state_dicts for rank 220
successfully loaded 8 ZeRO state_dicts for rank 158
successfully loaded 8 ZeRO state_dicts for rank 81
successfully loaded 8 ZeRO state_dicts for rank 111
successfully loaded 8 ZeRO state_dicts for rank 120
successfully loaded 8 ZeRO state_dicts for rank 211
successfully loaded 8 ZeRO state_dicts for rank 221
successfully loaded 8 ZeRO state_dicts for rank 16
successfully loaded 8 ZeRO state_dicts for rank 186
successfully loaded 8 ZeRO state_dicts for rank 223
successfully loaded 8 ZeRO state_dicts for rank 93
successfully loaded 8 ZeRO state_dicts for rank 95
successfully loaded 8 ZeRO state_dicts for rank 105
successfully loaded 8 ZeRO state_dicts for rank 21
successfully loaded 8 ZeRO state_dicts for rank 207
successfully loaded 8 ZeRO state_dicts for rank 107
successfully loaded 8 ZeRO state_dicts for rank 194
successfully loaded 8 ZeRO state_dicts for rank 142
successfully loaded 8 ZeRO state_dicts for rank 51
successfully loaded 8 ZeRO state_dicts for rank 209
successfully loaded 8 ZeRO state_dicts for rank 128
successfully loaded 8 ZeRO state_dicts for rank 160
successfully loaded 8 ZeRO state_dicts for rank 83
successfully loaded 8 ZeRO state_dicts for rank 97
successfully loaded 8 ZeRO state_dicts for rank 76
successfully loaded 8 ZeRO state_dicts for rank 135
successfully loaded 8 ZeRO state_dicts for rank 100
successfully loaded 8 ZeRO state_dicts for rank 174
successfully loaded 8 ZeRO state_dicts for rank 23
successfully loaded 8 ZeRO state_dicts for rank 121
successfully loaded 8 ZeRO state_dicts for rank 80
successfully loaded 8 ZeRO state_dicts for rank 75
successfully loaded 8 ZeRO state_dicts for rank 140
successfully loaded 8 ZeRO state_dicts for rank 205
loading 8 zero partition checkpoints for rank 180
successfully loaded 8 ZeRO state_dicts for rank 190
successfully loaded 8 ZeRO state_dicts for rank 215
successfully loaded 8 ZeRO state_dicts for rank 48
successfully loaded 8 ZeRO state_dicts for rank 202
successfully loaded 8 ZeRO state_dicts for rank 196
loading 8 zero partition checkpoints for rank 206
successfully loaded 8 ZeRO state_dicts for rank 165
loading 8 zero partition checkpoints for rank 108
successfully loaded 8 ZeRO state_dicts for rank 179
successfully loaded 8 ZeRO state_dicts for rank 175
successfully loaded 8 ZeRO state_dicts for rank 187
successfully loaded 8 ZeRO state_dicts for rank 126
successfully loaded 8 ZeRO state_dicts for rank 13
successfully loaded 8 ZeRO state_dicts for rank 36
WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 02:36:41 CEST)" was missed by 0:00:03.258297
successfully loaded 8 ZeRO state_dicts for rank 199
successfully loaded 8 ZeRO state_dicts for rank 55
successfully loaded 8 ZeRO state_dicts for rank 99
successfully loaded 8 ZeRO state_dicts for rank 115
successfully loaded 8 ZeRO state_dicts for rank 72
successfully loaded 8 ZeRO state_dicts for rank 162
successfully loaded 8 ZeRO state_dicts for rank 203
successfully loaded 8 ZeRO state_dicts for rank 22
successfully loaded 8 ZeRO state_dicts for rank 210
loading 8 zero partition checkpoints for rank 183
successfully loaded 8 ZeRO state_dicts for rank 82
successfully loaded 8 ZeRO state_dicts for rank 35
successfully loaded 8 ZeRO state_dicts for rank 129
successfully loaded 8 ZeRO state_dicts for rank 131
successfully loaded 8 ZeRO state_dicts for rank 192
successfully loaded 8 ZeRO state_dicts for rank 130
WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 02:36:42 CEST)" was missed by 0:00:03.400033
successfully loaded 8 ZeRO state_dicts for rank 156
successfully loaded 8 ZeRO state_dicts for rank 157
successfully loaded 8 ZeRO state_dicts for rank 208
successfully loaded 8 ZeRO state_dicts for rank 141
successfully loaded 8 ZeRO state_dicts for rank 181
successfully loaded 8 ZeRO state_dicts for rank 92
loading 8 zero partition checkpoints for rank 167
successfully loaded 8 ZeRO state_dicts for rank 12
successfully loaded 8 ZeRO state_dicts for rank 18
successfully loaded 8 ZeRO state_dicts for rank 118
successfully loaded 8 ZeRO state_dicts for rank 19
successfully loaded 8 ZeRO state_dicts for rank 32
successfully loaded 8 ZeRO state_dicts for rank 173
successfully loaded 8 ZeRO state_dicts for rank 236
successfully loaded 8 ZeRO state_dicts for rank 224
successfully loaded 8 ZeRO state_dicts for rank 132
successfully loaded 8 ZeRO state_dicts for rank 195
successfully loaded 8 ZeRO state_dicts for rank 69
successfully loaded 8 ZeRO state_dicts for rank 65
successfully loaded 8 ZeRO state_dicts for rank 41
successfully loaded 8 ZeRO state_dicts for rank 189
successfully loaded 8 ZeRO state_dicts for rank 71
successfully loaded 8 ZeRO state_dicts for rank 79
successfully loaded 8 ZeRO state_dicts for rank 87
successfully loaded 8 ZeRO state_dicts for rank 138
successfully loaded 8 ZeRO state_dicts for rank 212
successfully loaded 8 ZeRO state_dicts for rank 197
successfully loaded 8 ZeRO state_dicts for rank 8
successfully loaded 8 ZeRO state_dicts for rank 134
successfully loaded 8 ZeRO state_dicts for rank 14
successfully loaded 8 ZeRO state_dicts for rank 39
successfully loaded 8 ZeRO state_dicts for rank 201
successfully loaded 8 ZeRO state_dicts for rank 88
successfully loaded 8 ZeRO state_dicts for rank 125
successfully loaded 8 ZeRO state_dicts for rank 91
successfully loaded 8 ZeRO state_dicts for rank 163
successfully loaded 8 ZeRO state_dicts for rank 114
successfully loaded 8 ZeRO state_dicts for rank 237
successfully loaded 8 ZeRO state_dicts for rank 45
successfully loaded 8 ZeRO state_dicts for rank 193
successfully loaded 8 ZeRO state_dicts for rank 106
loading 8 zero partition checkpoints for rank 56
successfully loaded 8 ZeRO state_dicts for rank 218
successfully loaded 8 ZeRO state_dicts for rank 243
successfully loaded 8 ZeRO state_dicts for rank 25
successfully loaded 8 ZeRO state_dicts for rank 98
successfully loaded 8 ZeRO state_dicts for rank 245
loading 8 zero partition checkpoints for rank 112
successfully loaded 8 ZeRO state_dicts for rank 240
successfully loaded 8 ZeRO state_dicts for rank 213
loading 8 zero partition checkpoints for rank 60
successfully loaded 8 ZeRO state_dicts for rank 103
successfully loaded 8 ZeRO state_dicts for rank 0
successfully loaded 8 ZeRO state_dicts for rank 191
successfully loaded 8 ZeRO state_dicts for rank 149
successfully loaded 8 ZeRO state_dicts for rank 252
successfully loaded 8 ZeRO state_dicts for rank 67
successfully loaded 8 ZeRO state_dicts for rank 136
successfully loaded 8 ZeRO state_dicts for rank 49
successfully loaded 8 ZeRO state_dicts for rank 54
successfully loaded 8 ZeRO state_dicts for rank 119
successfully loaded 8 ZeRO state_dicts for rank 77
successfully loaded 8 ZeRO state_dicts for rank 73
loading 8 zero partition checkpoints for rank 168
successfully loaded 8 ZeRO state_dicts for rank 238
successfully loaded 8 ZeRO state_dicts for rank 139
successfully loaded 8 ZeRO state_dicts for rank 159
successfully loaded 8 ZeRO state_dicts for rank 94
successfully loaded 8 ZeRO state_dicts for rank 147
successfully loaded 8 ZeRO state_dicts for rank 3
successfully loaded 8 ZeRO state_dicts for rank 27
successfully loaded 8 ZeRO state_dicts for rank 233
successfully loaded 8 ZeRO state_dicts for rank 84
successfully loaded 8 ZeRO state_dicts for rank 17
successfully loaded 8 ZeRO state_dicts for rank 117
successfully loaded 8 ZeRO state_dicts for rank 137
successfully loaded 8 ZeRO state_dicts for rank 144
successfully loaded 8 ZeRO state_dicts for rank 133
successfully loaded 8 ZeRO state_dicts for rank 11
successfully loaded 8 ZeRO state_dicts for rank 101
successfully loaded 8 ZeRO state_dicts for rank 248
successfully loaded 8 ZeRO state_dicts for rank 90
successfully loaded 8 ZeRO state_dicts for rank 40
successfully loaded 8 ZeRO state_dicts for rank 46
successfully loaded 8 ZeRO state_dicts for rank 47
successfully loaded 8 ZeRO state_dicts for rank 78
successfully loaded 8 ZeRO state_dicts for rank 242
loading 8 zero partition checkpoints for rank 52
successfully loaded 8 ZeRO state_dicts for rank 239
successfully loaded 8 ZeRO state_dicts for rank 9
successfully loaded 8 ZeRO state_dicts for rank 74
successfully loaded 8 ZeRO state_dicts for rank 225
successfully loaded 8 ZeRO state_dicts for rank 68
successfully loaded 8 ZeRO state_dicts for rank 146
loading 8 zero partition checkpoints for rank 104
successfully loaded 8 ZeRO state_dicts for rank 122
successfully loaded 8 ZeRO state_dicts for rank 2
successfully loaded 8 ZeRO state_dicts for rank 123
successfully loaded 8 ZeRO state_dicts for rank 37
successfully loaded 8 ZeRO state_dicts for rank 53
loading 8 zero partition checkpoints for rank 176
successfully loaded 8 ZeRO state_dicts for rank 150
successfully loaded 8 ZeRO state_dicts for rank 28
successfully loaded 8 ZeRO state_dicts for rank 86
successfully loaded 8 ZeRO state_dicts for rank 234
successfully loaded 8 ZeRO state_dicts for rank 244
successfully loaded 8 ZeRO state_dicts for rank 226
loading 8 zero partition checkpoints for rank 110
successfully loaded 8 ZeRO state_dicts for rank 145
successfully loaded 8 ZeRO state_dicts for rank 228
successfully loaded 8 ZeRO state_dicts for rank 217
successfully loaded 8 ZeRO state_dicts for rank 216
successfully loaded 8 ZeRO state_dicts for rank 152
loading 8 zero partition checkpoints for rank 184
successfully loaded 8 ZeRO state_dicts for rank 227
successfully loaded 8 ZeRO state_dicts for rank 85
successfully loaded 8 ZeRO state_dicts for rank 154
loading 8 zero partition checkpoints for rank 127
successfully loaded 8 ZeRO state_dicts for rank 24
successfully loaded 8 ZeRO state_dicts for rank 241
loading 8 zero partition checkpoints for rank 96
successfully loaded 8 ZeRO state_dicts for rank 1
successfully loaded 8 ZeRO state_dicts for rank 64
successfully loaded 8 ZeRO state_dicts for rank 50
successfully loaded 8 ZeRO state_dicts for rank 42
successfully loaded 8 ZeRO state_dicts for rank 232
loading 8 zero partition checkpoints for rank 116
loading 8 zero partition checkpoints for rank 172
successfully loaded 8 ZeRO state_dicts for rank 33
successfully loaded 8 ZeRO state_dicts for rank 10
successfully loaded 8 ZeRO state_dicts for rank 31
successfully loaded 8 ZeRO state_dicts for rank 38
loading 8 zero partition checkpoints for rank 63
successfully loaded 8 ZeRO state_dicts for rank 89
successfully loaded 8 ZeRO state_dicts for rank 249
successfully loaded 8 ZeRO state_dicts for rank 246
loading 8 zero partition checkpoints for rank 58
successfully loaded 8 ZeRO state_dicts for rank 151
loading 8 zero partition checkpoints for rank 204
successfully loaded 8 ZeRO state_dicts for rank 155
successfully loaded 8 ZeRO state_dicts for rank 34
successfully loaded 8 ZeRO state_dicts for rank 250
successfully loaded 8 ZeRO state_dicts for rank 102
successfully loaded 8 ZeRO state_dicts for rank 230
successfully loaded 8 ZeRO state_dicts for rank 70
successfully loaded 8 ZeRO state_dicts for rank 26
successfully loaded 8 ZeRO state_dicts for rank 29
loading 8 zero partition checkpoints for rank 62
loading 8 zero partition checkpoints for rank 182
loading 8 zero partition checkpoints for rank 124
loading 8 zero partition checkpoints for rank 109
successfully loaded 8 ZeRO state_dicts for rank 247
successfully loaded 8 ZeRO state_dicts for rank 148
successfully loaded 8 ZeRO state_dicts for rank 30
successfully loaded 8 ZeRO state_dicts for rank 153
loading 8 zero partition checkpoints for rank 113
successfully loaded 8 ZeRO state_dicts for rank 251
successfully loaded 8 ZeRO state_dicts for rank 43
successfully loaded 8 ZeRO state_dicts for rank 235
loading 8 zero partition checkpoints for rank 177
loading 8 zero partition checkpoints for rank 200
loading 8 zero partition checkpoints for rank 214
successfully loaded 8 ZeRO state_dicts for rank 254
successfully loaded 8 ZeRO state_dicts for rank 229
loading 8 zero partition checkpoints for rank 164
loading 8 zero partition checkpoints for rank 44
loading 8 zero partition checkpoints for rank 211
loading 8 zero partition checkpoints for rank 111
loading 8 zero partition checkpoints for rank 221
loading 8 zero partition checkpoints for rank 143
successfully loaded 8 ZeRO state_dicts for rank 66
loading 8 zero partition checkpoints for rank 188
loading 8 zero partition checkpoints for rank 194
loading 8 zero partition checkpoints for rank 81
loading 8 zero partition checkpoints for rank 15
successfully loaded 8 ZeRO state_dicts for rank 231
loading 8 zero partition checkpoints for rank 207
loading 8 zero partition checkpoints for rank 107
loading 8 zero partition checkpoints for rank 160
successfully loaded 8 ZeRO state_dicts for rank 253
loading 8 zero partition checkpoints for rank 105
loading 8 zero partition checkpoints for rank 186
loading 8 zero partition checkpoints for rank 223
loading 8 zero partition checkpoints for rank 95
loading 8 zero partition checkpoints for rank 174
loading 8 zero partition checkpoints for rank 51
loading 8 zero partition checkpoints for rank 61
loading 8 zero partition checkpoints for rank 120
loading 8 zero partition checkpoints for rank 135
loading 8 zero partition checkpoints for rank 97
loading 8 zero partition checkpoints for rank 140
loading 8 zero partition checkpoints for rank 16
loading 8 zero partition checkpoints for rank 198
loading 8 zero partition checkpoints for rank 100
loading 8 zero partition checkpoints for rank 171
loading 8 zero partition checkpoints for rank 205
loading 8 zero partition checkpoints for rank 76
successfully loaded 8 ZeRO state_dicts for rank 219
successfully loaded 8 ZeRO state_dicts for rank 255
loading 8 zero partition checkpoints for rank 20
loading 8 zero partition checkpoints for rank 126
loading 8 zero partition checkpoints for rank 55
loading 8 zero partition checkpoints for rank 175
loading 8 zero partition checkpoints for rank 99
loading 8 zero partition checkpoints for rank 36
loading 8 zero partition checkpoints for rank 199
loading 8 zero partition checkpoints for rank 166
loading 8 zero partition checkpoints for rank 158
loading 8 zero partition checkpoints for rank 157
loading 8 zero partition checkpoints for rank 82
loading 8 zero partition checkpoints for rank 129
loading 8 zero partition checkpoints for rank 222
loading 8 zero partition checkpoints for rank 215
loading 8 zero partition checkpoints for rank 121
loading 8 zero partition checkpoints for rank 115
loading 8 zero partition checkpoints for rank 181
loading 8 zero partition checkpoints for rank 134
loading 8 zero partition checkpoints for rank 21
loading 8 zero partition checkpoints for rank 87
loading 8 zero partition checkpoints for rank 201
loading 8 zero partition checkpoints for rank 197
loading 8 zero partition checkpoints for rank 13
loading 8 zero partition checkpoints for rank 173
loading 8 zero partition checkpoints for rank 132
loading 8 zero partition checkpoints for rank 195
loading 8 zero partition checkpoints for rank 178
loading 8 zero partition checkpoints for rank 69
loading 8 zero partition checkpoints for rank 65
loading 8 zero partition checkpoints for rank 125
loading 8 zero partition checkpoints for rank 138
loading 8 zero partition checkpoints for rank 208
loading 8 zero partition checkpoints for rank 45
loading 8 zero partition checkpoints for rank 39
loading 8 zero partition checkpoints for rank 196
loading 8 zero partition checkpoints for rank 130
loading 8 zero partition checkpoints for rank 35
loading 8 zero partition checkpoints for rank 165
loading 8 zero partition checkpoints for rank 209
loading 8 zero partition checkpoints for rank 213
loading 8 zero partition checkpoints for rank 190
loading 8 zero partition checkpoints for rank 189
loading 8 zero partition checkpoints for rank 114
loading 8 zero partition checkpoints for rank 12
loading 8 zero partition checkpoints for rank 54
loading 8 zero partition checkpoints for rank 98
loading 8 zero partition checkpoints for rank 49
loading 8 zero partition checkpoints for rank 142
loading 8 zero partition checkpoints for rank 136
loading 8 zero partition checkpoints for rank 19
loading 8 zero partition checkpoints for rank 163
loading 8 zero partition checkpoints for rank 159
loading 8 zero partition checkpoints for rank 94
loading 8 zero partition checkpoints for rank 88
loading 8 zero partition checkpoints for rank 67
loading 8 zero partition checkpoints for rank 106
loading 8 zero partition checkpoints for rank 149
loading 8 zero partition checkpoints for rank 73
loading 8 zero partition checkpoints for rank 218
loading 8 zero partition checkpoints for rank 59
loading 8 zero partition checkpoints for rank 139
loading 8 zero partition checkpoints for rank 137
loading 8 zero partition checkpoints for rank 212
loading 8 zero partition checkpoints for rank 14
loading 8 zero partition checkpoints for rank 144
loading 8 zero partition checkpoints for rank 57
loading 8 zero partition checkpoints for rank 191
loading 8 zero partition checkpoints for rank 23
loading 8 zero partition checkpoints for rank 133
loading 8 zero partition checkpoints for rank 117
loading 8 zero partition checkpoints for rank 220
loading 8 zero partition checkpoints for rank 40
loading 8 zero partition checkpoints for rank 122
loading 8 zero partition checkpoints for rank 179
loading 8 zero partition checkpoints for rank 78
loading 8 zero partition checkpoints for rank 83
loading 8 zero partition checkpoints for rank 150
loading 8 zero partition checkpoints for rank 156
loading 8 zero partition checkpoints for rank 245
loading 8 zero partition checkpoints for rank 141
loading 8 zero partition checkpoints for rank 210
loading 8 zero partition checkpoints for rank 123
loading 8 zero partition checkpoints for rank 37
loading 8 zero partition checkpoints for rank 243
loading 8 zero partition checkpoints for rank 74
loading 8 zero partition checkpoints for rank 68
loading 8 zero partition checkpoints for rank 217
loading 8 zero partition checkpoints for rank 185
loading 8 zero partition checkpoints for rank 237
loading 8 zero partition checkpoints for rank 192
loading 8 zero partition checkpoints for rank 161
loading 8 zero partition checkpoints for rank 90
loading 8 zero partition checkpoints for rank 80
loading 8 zero partition checkpoints for rank 3
loading 8 zero partition checkpoints for rank 93
loading 8 zero partition checkpoints for rank 53
loading 8 zero partition checkpoints for rank 101
loading 8 zero partition checkpoints for rank 118
loading 8 zero partition checkpoints for rank 71
loading 8 zero partition checkpoints for rank 33
loading 8 zero partition checkpoints for rank 9
loading 8 zero partition checkpoints for rank 239
loading 8 zero partition checkpoints for rank 48
loading 8 zero partition checkpoints for rank 86
loading 8 zero partition checkpoints for rank 187
loading 8 zero partition checkpoints for rank 64
loading 8 zero partition checkpoints for rank 170
loading 8 zero partition checkpoints for rank 11
loading 8 zero partition checkpoints for rank 145
loading 8 zero partition checkpoints for rank 38
loading 8 zero partition checkpoints for rank 152
loading 8 zero partition checkpoints for rank 22
loading 8 zero partition checkpoints for rank 155
loading 8 zero partition checkpoints for rank 2
loading 8 zero partition checkpoints for rank 226
loading 8 zero partition checkpoints for rank 244
loading 8 zero partition checkpoints for rank 75
loading 8 zero partition checkpoints for rank 79
loading 8 zero partition checkpoints for rank 84
loading 8 zero partition checkpoints for rank 193
loading 8 zero partition checkpoints for rank 227
loading 8 zero partition checkpoints for rank 46
loading 8 zero partition checkpoints for rank 47
loading 8 zero partition checkpoints for rank 119
loading 8 zero partition checkpoints for rank 234
loading 8 zero partition checkpoints for rank 24
loading 8 zero partition checkpoints for rank 92
loading 8 zero partition checkpoints for rank 169
loading 8 zero partition checkpoints for rank 43
loading 8 zero partition checkpoints for rank 18
loading 8 zero partition checkpoints for rank 241
loading 8 zero partition checkpoints for rank 128
loading 8 zero partition checkpoints for rank 77
loading 8 zero partition checkpoints for rank 162
loading 8 zero partition checkpoints for rank 246
loading 8 zero partition checkpoints for rank 151
loading 8 zero partition checkpoints for rank 72
loading 8 zero partition checkpoints for rank 41
loading 8 zero partition checkpoints for rank 91
loading 8 zero partition checkpoints for rank 26
loading 8 zero partition checkpoints for rank 147
loading 8 zero partition checkpoints for rank 224
loading 8 zero partition checkpoints for rank 50
loading 8 zero partition checkpoints for rank 216
loading 8 zero partition checkpoints for rank 85
loading 8 zero partition checkpoints for rank 148
loading 8 zero partition checkpoints for rank 131
loading 8 zero partition checkpoints for rank 32
loading 8 zero partition checkpoints for rank 247
loading 8 zero partition checkpoints for rank 8
loading 8 zero partition checkpoints for rank 66
loading 8 zero partition checkpoints for rank 229
loading 8 zero partition checkpoints for rank 146
loading 8 zero partition checkpoints for rank 235
loading 8 zero partition checkpoints for rank 17
loading 8 zero partition checkpoints for rank 236
loading 8 zero partition checkpoints for rank 254
loading 8 zero partition checkpoints for rank 202
loading 8 zero partition checkpoints for rank 70
loading 8 zero partition checkpoints for rank 42
loading 8 zero partition checkpoints for rank 250
loading 8 zero partition checkpoints for rank 89
loading 8 zero partition checkpoints for rank 251
loading 8 zero partition checkpoints for rank 228
loading 8 zero partition checkpoints for rank 103
loading 8 zero partition checkpoints for rank 225
loading 8 zero partition checkpoints for rank 34
loading 8 zero partition checkpoints for rank 203
loading 8 zero partition checkpoints for rank 231
loading 8 zero partition checkpoints for rank 25
loading 8 zero partition checkpoints for rank 238
loading 8 zero partition checkpoints for rank 255
loading 8 zero partition checkpoints for rank 102
loading 8 zero partition checkpoints for rank 154
loading 8 zero partition checkpoints for rank 219
loading 8 zero partition checkpoints for rank 230
loading 8 zero partition checkpoints for rank 240
loading 8 zero partition checkpoints for rank 10
loading 8 zero partition checkpoints for rank 27
loading 8 zero partition checkpoints for rank 0
 checkpoint version 3.0
loading 8 zero partition checkpoints for rank 153
loading 8 zero partition checkpoints for rank 233
loading 8 zero partition checkpoints for rank 248
loading 8 zero partition checkpoints for rank 242
loading 8 zero partition checkpoints for rank 252
loading 8 zero partition checkpoints for rank 232
loading 8 zero partition checkpoints for rank 1
loading 8 zero partition checkpoints for rank 249
loading 8 zero partition checkpoints for rank 253
loading 8 zero partition checkpoints for rank 28
loading 8 zero partition checkpoints for rank 31
loading 8 zero partition checkpoints for rank 29
loading 8 zero partition checkpoints for rank 30
successfully loaded 8 ZeRO state_dicts for rank 5
loading 8 zero partition checkpoints for rank 5
successfully loaded 8 ZeRO state_dicts for rank 6
successfully loaded 8 ZeRO state_dicts for rank 4
successfully loaded 8 ZeRO state_dicts for rank 7
loading 8 zero partition checkpoints for rank 6
WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 02:38:42 CEST)" was missed by 0:00:03.040753
loading 8 zero partition checkpoints for rank 4
loading 8 zero partition checkpoints for rank 7
  successfully loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints at iteration 5827
time (ms) | load-checkpoint: 94708.03
[after model, optimizer, and learning rate scheduler are built] datetime: 2021-09-25 02:37:49 
> building train, validation, and test datasets ...
 > datasets target sizes (minimum size):
    train:      300000000
    validation: 1638400
    test:       10240
> building train, validation, and test datasets for GPT ...
 > building dataset index ...
    reading sizes...
    reading pointers...
    reading document index...
    creating numpy buffer of mmap...
    creating memory view of numpy buffer...
 > finished creating indexed dataset in 0.199121 seconds
    number of documents: 304230423
 > dataset split:
    train:
     document indices in [0, 288714672) total of 288714672 documents
    validation:
     document indices in [288714672, 303926193) total of 15211521 documents
    test:
     document indices in [303926193, 304230423) total of 304230 documents
 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_300000000ns_2048sl_42s_doc_idx.npy
 > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_300000000ns_2048sl_42s_sample_idx.npy
 > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_300000000ns_2048sl_42s_shuffle_idx.npy
    loaded indexed file in 0.460 seconds
    total number of samples: 394611670
    total number of epochs: 3
 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_1638400ns_2048sl_42s_doc_idx.npy
 > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_1638400ns_2048sl_42s_sample_idx.npy
 > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_1638400ns_2048sl_42s_shuffle_idx.npy
    loaded indexed file in 0.335 seconds
    total number of samples: 6927161
    total number of epochs: 1
 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_42s_doc_idx.npy
 > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_42s_sample_idx.npy
 > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_42s_shuffle_idx.npy
    loaded indexed file in 0.163 seconds
    total number of samples: 137384
    total number of epochs: 1
> finished creating GPT datasets ...
[after dataloaders are built] datetime: 2021-09-25 02:37:56 
done with setup ...
training ...
time (ms) | model-and-optimizer-setup: 102787.57 | train/valid/test-data-iterators-setup: 6275.52
[before the start of training step] datetime: 2021-09-25 02:37:56 
[2021-09-25 02:37:56,930] [INFO] [checkpointing.py:408:forward] Activation Checkpointing Information
[2021-09-25 02:37:56,931] [INFO] [checkpointing.py:409:forward] ----Partition Activations False, CPU CHECKPOINTING False
[2021-09-25 02:37:56,931] [INFO] [checkpointing.py:412:forward] ----contiguous Memory Checkpointing False with 32 total layers
[2021-09-25 02:37:56,931] [INFO] [checkpointing.py:415:forward] ----Synchronization False
[2021-09-25 02:37:56,931] [INFO] [checkpointing.py:416:forward] ----Profiling time in checkpointing False
[Rank 1] (after 5830 iterations) memory (MB) | allocated: 6685.79931640625 | max allocated: 13590.94921875 | reserved: 22862.0 | max reserved: 22862.0
[Rank 225] (after 5830 iterations) memory (MB) | allocated: 7107.7109375 | max allocated: 11885.68701171875 | reserved: 22492.0 | max reserved: 22492.0
[Rank 2] (after 5830 iterations) memory (MB) | allocated: 6685.79931640625 | max allocated: 13590.94921875 | reserved: 22862.0 | max reserved: 22862.0
[Rank 226] (after 5830 iterations) memory (MB) | allocated: 7107.7109375 | max allocated: 11885.6865234375 | reserved: 20752.0 | max reserved: 20752.0
[Rank 224] (after 5830 iterations) memory (MB) | allocated: 7107.7109375 | max allocated: 11885.6875 | reserved: 22492.0 | max reserved: 22492.0
[Rank 0] (after 5830 iterations) memory (MB) | allocated: 6685.79931640625 | max allocated: 13590.94921875 | reserved: 23246.0 | max reserved: 23246.0
[Rank 3] (after 5830 iterations) memory (MB) | allocated: 6685.79931640625 | max allocated: 13590.94921875 | reserved: 22862.0 | max reserved: 22862.0
[Rank 227] (after 5830 iterations) memory (MB) | allocated: 7107.7109375 | max allocated: 11885.68701171875 | reserved: 22492.0 | max reserved: 22492.0
 iteration     5830/  159576 | consumed samples:       168368 | elapsed time per iteration (ms): 21875.4 | learning rate: 4.656E-05 | global batch size:    64 | lm loss: 6.454423E+00 | loss scale: 2048.0 | grad norm: 45630.759 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[Rank 65] (after 5830 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11810.46728515625 | reserved: 19902.0 | max reserved: 19902.0
[Rank 33] (after 5830 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 12082.4677734375 | reserved: 19866.0 | max reserved: 19866.0
[Rank 97] (after 5830 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11538.466796875 | reserved: 19402.0 | max reserved: 19402.0
[Rank 66] (after 5830 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11810.46728515625 | reserved: 19890.0 | max reserved: 19890.0
[Rank 34] (after 5830 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 12082.4677734375 | reserved: 20370.0 | max reserved: 20370.0
[Rank 193] (after 5830 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 10722.46533203125 | reserved: 19066.0 | max reserved: 19066.0
[Rank 161] (after 5830 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 10994.4658203125 | reserved: 19146.0 | max reserved: 19146.0
[Rank 129] (after 5830 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11266.46630859375 | reserved: 19582.0 | max reserved: 19582.0
[Rank 162] (after 5830 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 10994.4658203125 | reserved: 19066.0 | max reserved: 19066.0
[Rank 130] (after 5830 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11266.46630859375 | reserved: 19434.0 | max reserved: 19434.0
[Rank 98] (after 5830 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11538.466796875 | reserved: 19674.0 | max reserved: 19674.0
[Rank 194] (after 5830 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 10722.46533203125 | reserved: 19066.0 | max reserved: 19066.0
[Rank 64] (after 5830 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11810.46728515625 | reserved: 20536.0 | max reserved: 20536.0
[Rank 32] (after 5830 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 12082.4677734375 | reserved: 20408.0 | max reserved: 20408.0
[Rank 99] (after 5830 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11538.466796875 | reserved: 19838.0 | max reserved: 19838.0
[Rank 131] (after 5830 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11266.46630859375 | reserved: 19502.0 | max reserved: 19502.0
[Rank 67] (after 5830 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11810.46728515625 | reserved: 19902.0 | max reserved: 19902.0
[Rank 35] (after 5830 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 12082.4677734375 | reserved: 19866.0 | max reserved: 19866.0
[Rank 192] (after 5830 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 10722.46533203125 | reserved: 19012.0 | max reserved: 19012.0
[Rank 128] (after 5830 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11266.46630859375 | reserved: 19908.0 | max reserved: 19908.0
[Rank 160] (after 5830 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 10994.4658203125 | reserved: 19636.0 | max reserved: 19636.0
[Rank 96] (after 5830 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11538.466796875 | reserved: 19988.0 | max reserved: 19988.0
[Rank 163] (after 5830 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 10994.4658203125 | reserved: 19306.0 | max reserved: 19306.0
[Rank 195] (after 5830 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 10722.46533203125 | reserved: 18826.0 | max reserved: 18826.0
 iteration     5840/  159576 | consumed samples:       169008 | elapsed time per iteration (ms): 16822.3 | learning rate: 4.674E-05 | global batch size:    64 | lm loss: 6.392004E+00 | loss scale: 2048.0 | grad norm: 53106.299 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5850/  159576 | consumed samples:       169648 | elapsed time per iteration (ms): 16813.6 | learning rate: 4.692E-05 | global batch size:    64 | lm loss: 6.347363E+00 | loss scale: 2048.0 | grad norm: 53512.430 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5860/  159576 | consumed samples:       170288 | elapsed time per iteration (ms): 16773.5 | learning rate: 4.709E-05 | global batch size:    64 | lm loss: 6.368040E+00 | loss scale: 2048.0 | grad norm: 49687.313 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5870/  159576 | consumed samples:       170928 | elapsed time per iteration (ms): 16844.9 | learning rate: 4.727E-05 | global batch size:    64 | lm loss: 6.372821E+00 | loss scale: 2048.0 | grad norm: 49107.159 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5880/  159576 | consumed samples:       171568 | elapsed time per iteration (ms): 16812.2 | learning rate: 4.745E-05 | global batch size:    64 | lm loss: 6.379050E+00 | loss scale: 2048.0 | grad norm: 76898.126 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5890/  159576 | consumed samples:       172208 | elapsed time per iteration (ms): 16819.7 | learning rate: 4.763E-05 | global batch size:    64 | lm loss: 6.333071E+00 | loss scale: 2048.0 | grad norm: 69874.656 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5900/  159576 | consumed samples:       172848 | elapsed time per iteration (ms): 16821.3 | learning rate: 4.780E-05 | global batch size:    64 | lm loss: 6.354385E+00 | loss scale: 2048.0 | grad norm: 57915.999 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5910/  159576 | consumed samples:       173488 | elapsed time per iteration (ms): 16679.9 | learning rate: 4.798E-05 | global batch size:    64 | lm loss: 6.361916E+00 | loss scale: 2048.0 | grad norm: 56535.869 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5920/  159576 | consumed samples:       174128 | elapsed time per iteration (ms): 16731.8 | learning rate: 4.816E-05 | global batch size:    64 | lm loss: 6.371978E+00 | loss scale: 2048.0 | grad norm: 75613.913 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5930/  159576 | consumed samples:       174768 | elapsed time per iteration (ms): 16796.3 | learning rate: 4.834E-05 | global batch size:    64 | lm loss: 6.373956E+00 | loss scale: 2048.0 | grad norm: 64436.905 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-25 03:08:32] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1185639_[1-10%1] on 'gpu_p13' partition)
[2021-09-25 03:08:32] PULSE: tr8-104B is running for 33:04 since 2021-09-25T02:35:28 (1185609 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n[0,7-8])
 iteration     5940/  159576 | consumed samples:       175408 | elapsed time per iteration (ms): 16680.4 | learning rate: 4.851E-05 | global batch size:    64 | lm loss: 6.367229E+00 | loss scale: 2048.0 | grad norm: 61103.619 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5950/  159576 | consumed samples:       176048 | elapsed time per iteration (ms): 16548.2 | learning rate: 4.869E-05 | global batch size:    64 | lm loss: 6.365273E+00 | loss scale: 2048.0 | grad norm: 74137.806 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5960/  159576 | consumed samples:       176688 | elapsed time per iteration (ms): 16720.7 | learning rate: 4.887E-05 | global batch size:    64 | lm loss: 6.339179E+00 | loss scale: 2048.0 | grad norm: 117906.851 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5970/  159576 | consumed samples:       177328 | elapsed time per iteration (ms): 16666.6 | learning rate: 4.905E-05 | global batch size:    64 | lm loss: 6.366007E+00 | loss scale: 2048.0 | grad norm: 135736.452 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5980/  159576 | consumed samples:       177968 | elapsed time per iteration (ms): 16712.0 | learning rate: 4.922E-05 | global batch size:    64 | lm loss: 6.311417E+00 | loss scale: 2048.0 | grad norm: 59672.483 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5990/  159576 | consumed samples:       178608 | elapsed time per iteration (ms): 16795.9 | learning rate: 4.940E-05 | global batch size:    64 | lm loss: 6.346366E+00 | loss scale: 2048.0 | grad norm: 70394.026 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-25 03:26:24,359] [INFO] [logging.py:68:log_dist] [Rank 0] step=6000, skipped=13, lr=[4.9579881656804734e-05, 4.9579881656804734e-05], mom=[(0.9, 0.999), (0.9, 0.999)]
steps: 6000 loss: 6.4051 iter time (s): 0.008 samples/sec: 7888.018
 iteration     6000/  159576 | consumed samples:       179248 | elapsed time per iteration (ms): 16825.1 | learning rate: 4.958E-05 | global batch size:    64 | lm loss: 6.338142E+00 | loss scale: 2048.0 | grad norm: 51469.855 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
------------------------------------------------------------------------------------------------
 validation loss at iteration 6000 | lm loss value: 6.305492E+00 | lm loss PPL: 5.475711E+02 | 
------------------------------------------------------------------------------------------------
saving checkpoint at iteration    6000 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints
[2021-09-25 03:26:46,630] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/global_step6000/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration    6000 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints
time (ms) | save-checkpoint: 18535.85
 iteration     6010/  159576 | consumed samples:       179888 | elapsed time per iteration (ms): 19605.0 | learning rate: 4.976E-05 | global batch size:    64 | lm loss: 6.332598E+00 | loss scale: 2048.0 | grad norm: 64216.775 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6020/  159576 | consumed samples:       180528 | elapsed time per iteration (ms): 16682.2 | learning rate: 4.993E-05 | global batch size:    64 | lm loss: 6.346989E+00 | loss scale: 2048.0 | grad norm: 65052.382 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6030/  159576 | consumed samples:       181168 | elapsed time per iteration (ms): 16536.1 | learning rate: 5.011E-05 | global batch size:    64 | lm loss: 6.314711E+00 | loss scale: 2048.0 | grad norm: 61186.621 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6040/  159576 | consumed samples:       181808 | elapsed time per iteration (ms): 16509.4 | learning rate: 5.029E-05 | global batch size:    64 | lm loss: 6.347876E+00 | loss scale: 2048.0 | grad norm: 80684.961 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6050/  159576 | consumed samples:       182448 | elapsed time per iteration (ms): 16821.6 | learning rate: 5.047E-05 | global batch size:    64 | lm loss: 6.345741E+00 | loss scale: 2048.0 | grad norm: 207970.428 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6060/  159576 | consumed samples:       183088 | elapsed time per iteration (ms): 16815.3 | learning rate: 5.064E-05 | global batch size:    64 | lm loss: 6.341463E+00 | loss scale: 2048.0 | grad norm: 57913.351 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6070/  159576 | consumed samples:       183728 | elapsed time per iteration (ms): 16825.8 | learning rate: 5.082E-05 | global batch size:    64 | lm loss: 6.336625E+00 | loss scale: 2048.0 | grad norm: 62496.040 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6080/  159576 | consumed samples:       184368 | elapsed time per iteration (ms): 16749.3 | learning rate: 5.100E-05 | global batch size:    64 | lm loss: 6.378619E+00 | loss scale: 2048.0 | grad norm: 53421.784 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6090/  159576 | consumed samples:       185008 | elapsed time per iteration (ms): 16844.2 | learning rate: 5.118E-05 | global batch size:    64 | lm loss: 6.363810E+00 | loss scale: 2048.0 | grad norm: 53621.070 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6100/  159576 | consumed samples:       185648 | elapsed time per iteration (ms): 16803.1 | learning rate: 5.136E-05 | global batch size:    64 | lm loss: 6.397610E+00 | loss scale: 2048.0 | grad norm: 63234.859 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6110/  159576 | consumed samples:       186288 | elapsed time per iteration (ms): 16808.5 | learning rate: 5.153E-05 | global batch size:    64 | lm loss: 6.359557E+00 | loss scale: 2048.0 | grad norm: 52582.093 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6120/  159576 | consumed samples:       186928 | elapsed time per iteration (ms): 16792.9 | learning rate: 5.171E-05 | global batch size:    64 | lm loss: 6.347573E+00 | loss scale: 2048.0 | grad norm: 50959.488 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6130/  159576 | consumed samples:       187568 | elapsed time per iteration (ms): 16806.7 | learning rate: 5.189E-05 | global batch size:    64 | lm loss: 6.351057E+00 | loss scale: 2048.0 | grad norm: 152670.439 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6140/  159576 | consumed samples:       188208 | elapsed time per iteration (ms): 16808.0 | learning rate: 5.207E-05 | global batch size:    64 | lm loss: 6.374673E+00 | loss scale: 2048.0 | grad norm: 50742.188 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-25 04:08:28] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1185639_[1-10%1] on 'gpu_p13' partition)
[2021-09-25 04:08:28] PULSE: tr8-104B is running for 1:33:00 since 2021-09-25T02:35:28 (1185609 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n[0,7-8])
 iteration     6150/  159576 | consumed samples:       188848 | elapsed time per iteration (ms): 16696.6 | learning rate: 5.224E-05 | global batch size:    64 | lm loss: 6.323299E+00 | loss scale: 2048.0 | grad norm: 55101.801 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6160/  159576 | consumed samples:       189600 | elapsed time per iteration (ms): 17385.3 | learning rate: 5.245E-05 | global batch size:    80 | lm loss: 6.368839E+00 | loss scale: 2048.0 | grad norm: 51296.238 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6170/  159576 | consumed samples:       190400 | elapsed time per iteration (ms): 17823.6 | learning rate: 5.267E-05 | global batch size:    80 | lm loss: 6.355129E+00 | loss scale: 2048.0 | grad norm: 85490.491 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6180/  159576 | consumed samples:       191200 | elapsed time per iteration (ms): 17757.4 | learning rate: 5.289E-05 | global batch size:    80 | lm loss: 6.373211E+00 | loss scale: 2048.0 | grad norm: 112584.865 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6190/  159576 | consumed samples:       192000 | elapsed time per iteration (ms): 17583.1 | learning rate: 5.312E-05 | global batch size:    80 | lm loss: 6.372861E+00 | loss scale: 2048.0 | grad norm: 102723.952 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6200/  159576 | consumed samples:       192800 | elapsed time per iteration (ms): 17380.3 | learning rate: 5.334E-05 | global batch size:    80 | lm loss: 6.336594E+00 | loss scale: 2048.0 | grad norm: 41950.188 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6210/  159576 | consumed samples:       193600 | elapsed time per iteration (ms): 17443.3 | learning rate: 5.356E-05 | global batch size:    80 | lm loss: 6.355303E+00 | loss scale: 4096.0 | grad norm: 131493.175 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
saving checkpoint at iteration    6210 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints
[2021-09-25 04:26:31,836] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/global_step6210/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration    6210 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints
time (ms) | save-checkpoint: 19730.09
[exiting program after 110.16574249267578 minutes] datetime: 2021-09-25 04:26:46 
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed-------------------------------------------------- ..
 compatibleDeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------fused_adam 
............. DeepSpeed C++/CUDA extension op report[93m[NO][0m
 .......-------------------------------------------------- 
[92m[OKAY][0mNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------
JIT compiled ops requires ninjafused_lamb
 ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adamninja .............  ..................[93m[NO][0m  .......[92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------
op name fused_lamb................  .............installed  [93m[NO][0m..  .......compatible 
[92m[OKAY][0m
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
sparse_attn fused_adam ............. [93m[NO][0m ...................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0mfused_lamb
 ............. [93m[NO][0m .......transformer  [92m[OKAY][0m............
 [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
stochastic_transformer . [93m[NO][0msparse_attn  ...................  [92m[OKAY][0m[93m[NO][0m 
....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ...............async_io [93m[NO][0m  ......................  [93m[NO][0m[93m[NO][0m
 ....... [93m[NO][0m
transformer_inference .. transformer_inference[93m[NO][0m  .........  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
utils .................. [92m[YES][0mutils  ........................  [92m[OKAY][0m[92m[YES][0m
 ...... [92m[OKAY][0mquantizer
 .............. [93m[NO][0m quantizer.......  ..............[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
----------------------------------------------------------------------------------------------------
JIT compiled ops requires ninja

DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
async_io ............... [93m[NO][0m ....... [93m[NO][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
JIT compiled ops requires ninja
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
op name ................ installed .. compatible
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
op name ................ installed .. compatible
--------------------------------------------------
async_io ............... [93m[NO][0m ....... [93m[NO][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . --------------------------------------------------[93m[NO][0m ....... 
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
[92m[OKAY][0m
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
JIT compiled ops requires ninja
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
ninja .................. [92m[OKAY][0m
torch version .................... 1.8.1
--------------------------------------------------
torch cuda version ............... 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
op name ................ installed .. compatible
--------------------------------------------------
nvcc version ..................... 11.2
async_io ............... [93m[NO][0m ....... [93m[NO][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
DeepSpeed general environment info:
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
torch version .................... 1.8.1
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
torch cuda version ............... 11.1
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
nvcc version ..................... 11.2
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
--------------------------------------------------
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
op name ................ installed .. compatible
--------------------------------------------------
--------------------------------------------------
cpu_adam ninja...............  [92m[YES][0m..................  [92m[OKAY][0m......
 [92m[OKAY][0m--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------

cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
op name ................ installed .. compatible
fused_adam --------------------------------------------------.............
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
 [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb .............cpu_adam  [93m[NO][0m...............  .......[92m[YES][0m  ......[92m[OKAY][0m 
[92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
fused_adamsparse_attn  .........................  [93m[NO][0m [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
fused_lamb transformer.............  ............[93m[NO][0m  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0msparse_attn
 ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
transformer_inferencetransformer_inference  .... [93m[NO][0m ....... [92m[OKAY][0m
 [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ......utils [92m[OKAY][0m
JIT compiled ops requires ninja
 ..................quantizer  [92m[YES][0m..............  ......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
--------------------------------------------------
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version .....................  .....................11.2 
11.2
deepspeed install pathdeepspeed install path  ........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info deepspeed info...................  ...................0.4.2+bc17042, bc17042, big-science 
0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. deepspeed wheel compiled w.......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
JIT compiled ops requires ninja
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam-------------------------------------------------- ............... 
[92m[YES][0m DeepSpeed C++/CUDA extension op report...... 
[92m[OKAY][0m--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
fused_lambninja .............  ..................[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m--------------------------------------------------

DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
op name ................ installed .. compatible
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
sparse_attn ............ [93m[NO][0m .......cpu_adam  [92m[OKAY][0m...............
 [92m[YES][0m ......transformer  [92m[OKAY][0m............
 [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
stochastic_transformerfused_adam  ..............  [93m[NO][0m[93m[NO][0m .......  .......[92m[OKAY][0m 
JIT compiled ops requires ninja
[92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
--------------------------------------------------
async_io ............... [93m[NO][0m ....... [93m[NO][0m
op name ................ installed .. compatible
--------------------------------------------------
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
ninja .................. [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
DeepSpeed general environment info:torch install path 
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
............... torch install path ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']...............
 torch version .................... 1.8.1
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch cuda version ...............torch version  11.1....................
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
 1.8.1nvcc version
 ..................... torch cuda version11.2 
...............deepspeed install path  11.1...........
 nvcc version ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'].....................
 deepspeed info11.2 
...................deepspeed install path  0.4.2+bc17042, bc17042, big-science...........
 deepspeed wheel compiled w. ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']......
 deepspeed infotorch 1.8, cuda 11.1 
................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... async_io[93m[NO][0m  ......................  [93m[NO][0m[93m[NO][0m
 ....... [93m[NO][0m
transformer_inference ..transformer_inference  [93m[NO][0m..  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
utils ..................utils  [92m[YES][0m..................  ......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
quantizer quantizer..............  [93m[NO][0m..............  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
ninja .................. [92m[OKAY][0m
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
--------------------------------------------------
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
----------------------------------------------------------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

DeepSpeed C++/CUDA extension op report
----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... DeepSpeed general environment info:
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch install pathtorch version  ...................................  1.8.1
torch cuda version ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']11.1

nvcc version torch version.....................  ....................11.2 
1.8.1deepspeed install path
 ........... torch cuda version ...............['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
11.1deepspeed info
 nvcc version...................  .....................0.4.2+bc17042, bc17042, big-science 
11.2
deepspeed wheel compiled w. deepspeed install path......  ...........torch 1.8, cuda 11.1 
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op reportstochastic_transformer
 --------------------------------------------------
. NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.[93m[NO][0m
 --------------------------------------------------.......
 JIT compiled ops requires ninja[92m[OKAY][0m

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
--------------------------------------------------
async_io ............... [93m[NO][0m ....... [93m[NO][0m
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
op name ................ installed .. compatible
--------------------------------------------------
async_io ............... [93m[NO][0m ....... [93m[NO][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
DeepSpeed general environment info:
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
utils ..................[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [92m[YES][0m ......
 [92m[OKAY][0m
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
async_io ............... [93m[NO][0m ....... [93m[NO][0m
ninja .................. [92m[OKAY][0m
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
op name ................ installed .. compatible
--------------------------------------------------
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
ninja .................. [92m[OKAY][0m
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
op name ................ installed .. compatible
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
JIT compiled ops requires ninja
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
op name ................ installed .. compatible
--------------------------------------------------
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
async_io ............... [93m[NO][0m ....... [93m[NO][0m
torch cuda version ............... 11.1
nvcc version ..................... 11.2
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
JIT compiled ops requires ninja
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

ninja .................. [92m[OKAY][0m
async_ioasync_io ...............  ...............[93m[NO][0m  [93m[NO][0m.......  .......[93m[NO][0m 
[93m[NO][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
transformer_inference .. transformer_inference[93m[NO][0m  .........  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
op name ................ installed .. compatible
--------------------------------------------------
utils .................. utils[92m[YES][0m  ........................  [92m[YES][0m[92m[OKAY][0m 
...... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io ............... [93m[NO][0m  ......................  [93m[NO][0m[93m[NO][0m 
....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0mtransformer_inference
 .. [93m[NO][0m ....... utils[92m[OKAY][0m 
.................. [92m[YES][0m ...... utils[92m[OKAY][0m 
.................. [92m[YES][0mquantizer  ....................  [93m[NO][0m ....... [92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ....... .......[93m[NO][0m 
[93m[NO][0m
transformer_inference ..transformer_inference  [93m[NO][0m..  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
utils utils..................  ..................[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  .............. [92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
--------------------------------------------------
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------sparse_attn ............
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
 [93m[NO][0mDeepSpeed C++/CUDA extension op report 
....... --------------------------------------------------[92m[OKAY][0m

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
transformer --------------------------------------------------............
 JIT compiled ops requires ninja[93m[NO][0m 
....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science 
...........deepspeed wheel compiled w.  ......['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] torch 1.8, cuda 11.1

deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
async_io ............... [93m[NO][0m ....... [93m[NO][0m
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
ninja .................. [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
--------------------------------------------------
JIT compiled ops requires ninja
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ...................DeepSpeed general environment info: 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w.
 ...... torch 1.8, cuda 11.1
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
op name ................ installed .. compatible
--------------------------------------------------
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
--------------------------------------------------
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
torch version .................... 1.8.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
async_io ............... [93m[NO][0m ....... [93m[NO][0m
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch version torch version....................  .................... 1.8.1
1.8.1
torch cuda version torch cuda version...............  ...............11.1 
11.1
nvcc version nvcc version.....................  .....................11.2 
11.2deepspeed install path
 ...........deepspeed install path  ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info
 ...................deepspeed info  ...................0.4.2+bc17042, bc17042, big-science 
0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w.
 deepspeed wheel compiled w.......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... DeepSpeed general environment info:11.2
deepspeed install path ...........
 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed infotorch install path ...................  ...............0.4.2+bc17042, bc17042, big-science 
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed general environment info:
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
async_io ............... [93m[NO][0masync_io .......  [93m[NO][0m...............
 [93m[NO][0m ....... [93m[NO][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... transformer_inference[92m[OKAY][0m 
.. [93m[NO][0m ....... [92m[OKAY][0mutils
--------------------------------------------------
 .................. [92m[YES][0m ...... [92m[OKAY][0mutils
 .................. [92m[YES][0mquantizer  ....................  [92m[OKAY][0m[93m[NO][0m
 ....... quantizer[92m[OKAY][0m 
.............. [93m[NO][0m --------------------------------------------------.......
 [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.--------------------------------------------------

async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... [93m[NO][0m .......async_io  [93m[NO][0m...............
 [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utils .................. [92m[YES][0m ...... [92m[OKAY][0m
----------------------------------------------------------------------------------------------------

quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

torch version .................... 1.8.1
transformer_inference ..transformer_inference  [93m[NO][0m..  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
torch cuda version ............... 11.1
utils ..................utils  [92m[YES][0m..................  ......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
nvcc version ..................... 11.2
quantizer ..............quantizer  [93m[NO][0m..............  ....... [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
--------------------------------------------------
--------------------------------------------------
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... async_io[93m[NO][0m
 ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inferenceutils  .................... [92m[YES][0m ...... [92m[OKAY][0m
 [93m[NO][0m ....... [92m[OKAY][0mquantizer
 .............. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m-------------------------------------------------- 
...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
/bin/sh: line 0: type: git: not found
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
torch cuda version ............... 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
nvcc version ..................... 11.2
JIT compiled ops requires ninja
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
DeepSpeed general environment info:
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
DeepSpeed general environment info:torch install path
 ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
--------------------------------------------------
async_io ............... [93m[NO][0m ....... [93m[NO][0m
torch version ....................['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 
1.8.1
torch version torch cuda version....................  ...............1.8.1 
11.1
op name ................ installed .. compatible
--------------------------------------------------
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
torch cuda versionnvcc version  ....................................  11.111.2

cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
nvcc versiondeepspeed install path  ................................  11.2
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed install path deepspeed info...........  ................... 0.4.2+bc17042, bc17042, big-science['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
deepspeed wheel compiled w.deepspeed info  .........................  torch 1.8, cuda 11.10.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
torch cuda version ............... 11.1
JIT compiled ops requires ninja
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:DeepSpeed general environment info:

ninja .................. [92m[OKAY][0m
torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

--------------------------------------------------
torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

op name ................ installed .. compatible
--------------------------------------------------
deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
JIT compiled ops requires ninja
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.1
11.1nvcc version
 nvcc version.....................  .....................11.2 
11.2deepspeed install path
 ...........deepspeed install path  ...........['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info
 deepspeed info...................  ...................0.4.2+bc17042, bc17042, big-science 
0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w.
 ...... torch 1.8, cuda 11.1
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
ninja .................. [92m[OKAY][0m
torch version .................... 1.8.1
--------------------------------------------------
torch cuda version ............... 11.1
op name ................ installed .. compatible
--------------------------------------------------
nvcc version ..................... 11.2
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
--------------------------------------------------
torch cuda version ............... 11.1
nvcc version ..................... 11.2
DeepSpeed C++/CUDA extension op report
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
ninja .................. [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
op name ................ installed .. compatible
DeepSpeed general environment info:
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
--------------------------------------------------
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
async_io ............... [93m[NO][0m ....... [93m[NO][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
torch version .................... 1.8.1
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
torch cuda version ............... 11.1
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
nvcc version ..................... 11.2
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
--------------------------------------------------
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
ninja .................. [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

DeepSpeed general environment info:
async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
async_io ............... [93m[NO][0m ....... [93m[NO][0m
torch version .................... 1.8.1
transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

torch cuda version ............... 11.1
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils utils..................  ..................[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
nvcc version ..................... 11.2
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
[92m[OKAY][0m
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
quantizer quantizer..............  ..............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
ninja .................. [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
--------------------------------------------------
op name ................ installed .. compatible
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
--------------------------------------------------
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
ninja .................. [92m[OKAY][0m
--------------------------------------------------
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
DeepSpeed general environment info:
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
torch version .................... 1.8.1
torch cuda version ............... 11.1
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
nvcc version ..................... 11.2
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
JIT compiled ops requires ninja
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
--------------------------------------------------
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
async_io ............... [93m[NO][0m ....... [93m[NO][0m
torch version .................... 1.8.1
async_io ............... [93m[NO][0m ....... [93m[NO][0m
torch cuda version ............... 11.1
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
nvcc version ..................... 11.2
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
DeepSpeed general environment info:
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
ninja .................. [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
JIT compiled ops requires ninja
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
op name ................ installed .. compatible
--------------------------------------------------
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
async_io ............... [93m[NO][0m ....... [93m[NO][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
JIT compiled ops requires ninjaninja
 .................. [92m[OKAY][0m
--------------------------------------------------
torch cuda version ............... 11.1
op name ................ installed .. compatible
--------------------------------------------------
nvcc version ..................... 11.2
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
async_io ............... [93m[NO][0m ....... [93m[NO][0m
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0mninja
 .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installedfused_adam  ...............  compatible[93m[NO][0m
 .......-------------------------------------------------- 
[92m[OKAY][0m
fused_lamb ............. [93m[NO][0mcpu_adam  ......................  [92m[OKAY][0m[92m[YES][0m
 ...... [92m[OKAY][0m
sparse_attn fused_adam............  .............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
transformer ............fused_lamb  [93m[NO][0m.............  .......[93m[NO][0m  [92m[OKAY][0m.......
ninja .................. [92m[OKAY][0m
 [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
op name ................ installed .. compatible
--------------------------------------------------
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
ninja .................. [92m[OKAY][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
--------------------------------------------------
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
op name ................ installed .. compatible
--------------------------------------------------
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
DeepSpeed general environment info:
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
torch cuda version ............... 11.1
--------------------------------------------------
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
nvcc version ..................... 11.2
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
DeepSpeed general environment info:
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
torch cuda version ............... 11.1
async_io ............... [93m[NO][0m ....... [93m[NO][0m
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
DeepSpeed general environment info:
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
torch cuda version ............... 11.1
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
nvcc versionDeepSpeed general environment info: ..................... 11.2

--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
deepspeed install path ........... torch install path['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info  ..................................  0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

JIT compiled ops requires ninja
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
torch cuda version ............... 11.1
nvcc version ..................... 11.2
async_io ............... [93m[NO][0m ....... [93m[NO][0m
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inferencetransformer_inference  .. ..[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
utils utils..................  ..................[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
quantizer ..............quantizer  [93m[NO][0m .....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
/bin/sh: line 0: type: git: not found
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
ninja .................. [92m[OKAY][0m
nvcc version ..................... 11.2
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
/bin/sh: line 0: type: git: not found
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja ninja..................  [92m[OKAY][0m..................
 --------------------------------------------------[92m[OKAY][0m

op name --------------------------------------------------................
 op nameinstalled  ..................  compatibleinstalled
 --------------------------------------------------
.. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m cpu_adam......  [92m[OKAY][0m...............
 [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_adam .............fused_lamb  .............[93m[NO][0m  [93m[NO][0m....... .......  [92m[OKAY][0m[92m[OKAY][0m

fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformersparse_attn  ........................ [93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m
 [92m[OKAY][0m
stochastic_transformertransformer  .............  [93m[NO][0m[93m[NO][0m .......  .......[92m[OKAY][0m
 [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
----------------------------------------------------------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------DeepSpeed C++/CUDA extension op report

JIT compiled ops requires ninja
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
DeepSpeed general environment info:DeepSpeed general environment info:

transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
torch install path torch install path...............  ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version torch version....................  ....................1.8.1 
1.8.1
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
torch cuda version torch cuda version...............  ...............11.1 11.1

nvcc versionnvcc version  ..........................................  11.211.2

quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

--------------------------------------------------
deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0mninja
 .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... DeepSpeed general environment info:
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch install path torch version...............  .................... 1.8.1
torch cuda version ...............['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 
11.1
nvcc versiontorch version  .........................................  11.21.8.1

deepspeed install path torch cuda version...........  ............... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']11.1

deepspeed infonvcc version  ........................................  11.20.4.2+bc17042, bc17042, big-science

deepspeed install path deepspeed wheel compiled w............  ...... torch 1.8, cuda 11.1['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ...............async_io [93m[NO][0m  ......................  [93m[NO][0m[93m[NO][0m
 ....... [93m[NO][0m
transformer_inference .. transformer_inference[93m[NO][0m  .........  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
utils .................. [92m[YES][0mutils  ........................  [92m[OKAY][0m[92m[YES][0m
 ...... [92m[OKAY][0mquantizer
 .............. [93m[NO][0m .......quantizer  [92m[OKAY][0m..............
 [93m[NO][0m .......-------------------------------------------------- 
[92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
DeepSpeed general environment info:
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
op name ................ installed .. compatible
--------------------------------------------------
async_io ............... [93m[NO][0m ....... [93m[NO][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. .. [93m[NO][0m
 ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
async_ioquantizer  .............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[92m[OKAY][0m

stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
DeepSpeed general environment info:
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
DeepSpeed general environment info:
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
torch version .................... 1.8.1
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
--------------------------------------------------
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------

torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
/bin/sh: line 0: type: git: not found
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
/bin/sh: line 0: type: git: not found
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io async_io...............  [93m[NO][0m...............  .......[93m[NO][0m  [93m[NO][0m.......
 [93m[NO][0m
transformer_inference .. [93m[NO][0m transformer_inference.......  ..[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m utils......  ..................[92m[OKAY][0m 
[92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. quantizer[93m[NO][0m  .....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m--------------------------------------------------

--------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
--------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
DeepSpeed general environment info:deepspeed install path ........... 
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ...................torch install path 0.4.2+bc17042, bc17042, big-science 
...............deepspeed wheel compiled w.  ...... torch 1.8, cuda 11.1
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
torch install path ............... DeepSpeed general environment info:
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch install pathtorch version ....................  ...............1.8.1
 torch cuda version ............... 11.1
nvcc version ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']..................... 
11.2
JIT compiled ops requires ninja
deepspeed install path torch version...........  ....................['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
1.8.1deepspeed info
 ................... 0.4.2+bc17042, bc17042, big-sciencetorch cuda version
 deepspeed wheel compiled w................  ...... 11.1torch 1.8, cuda 11.1

nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io async_io...............  [93m[NO][0m...............  .......[93m[NO][0m  [93m[NO][0m.......
 [93m[NO][0m
transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utils utils..................  ..................[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
quantizer ..............quantizer  [93m[NO][0m..............  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed general environment info:
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:torch install path 
............... torch install path ...............['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 
torch version .................... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']1.8.1

torch cuda versiontorch version  ...................................  11.11.8.1

nvcc version torch cuda version.....................  ...............11.2 
11.1deepspeed install path
 nvcc version...........  ..................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']11.2

deepspeed install pathdeepspeed info  ..............................  0.4.2+bc17042, bc17042, big-science['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed wheel compiled w.deepspeed info  .........................  torch 1.8, cuda 11.10.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... [93m[NO][0m .......async_io [93m[NO][0m
 ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m .......utils  [92m[OKAY][0m..................
 [92m[YES][0m ...... [92m[OKAY][0m
utils ..................quantizer  [92m[YES][0m..............  ......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
quantizer --------------------------------------------------..............
 [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
DeepSpeed general environment info:
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
torch install path ............... DeepSpeed general environment info:
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
/bin/sh: line 0: type: git: not found
torch install pathtorch version  ...................................  1.8.1
torch cuda version ...............['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 
11.1
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
torch versionnvcc version  .........................................  1.8.111.2

stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
deepspeed install path torch cuda version...........  ............... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']11.1

deepspeed infonvcc version  ........................................  0.4.2+bc17042, bc17042, big-science11.2

deepspeed wheel compiled w.deepspeed install path  .................  torch 1.8, cuda 11.1
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
/bin/sh: line 0: type: git: not found
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io async_io............... [93m[NO][0m  ......................  [93m[NO][0m[93m[NO][0m 
....... [93m[NO][0m
/bin/sh: line 0: type: git: not found
transformer_inference .. [93m[NO][0m transformer_inference.......  ..[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ......utils  [92m[OKAY][0m..................
 [92m[YES][0m ......quantizer  [92m[OKAY][0m..............
 [93m[NO][0m ....... [92m[OKAY][0mquantizer
 .............. [93m[NO][0m --------------------------------------------------.......
 [92m[OKAY][0m
--------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
/bin/sh: line 0: type: git: not found
--------------------------------------------------
JIT compiled ops requires ninja
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
ninja .................. [92m[OKAY][0m
--------------------------------------------------
async_io ............... [93m[NO][0m ....... [93m[NO][0m
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
--------------------------------------------------
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
--------------------------------------------------
async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inference ..transformer_inference  [93m[NO][0m..  .......[93m[NO][0m  [92m[OKAY][0m....... [92m[OKAY][0m

utils .................. [92m[YES][0m utils...... [92m[OKAY][0m
 .................. [92m[YES][0mquantizer  ....................  [92m[OKAY][0m[93m[NO][0m ....... 
[92m[OKAY][0m
quantizer-------------------------------------------------- 
.............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
DeepSpeed general environment info:
DeepSpeed general environment info:
torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch version
 .................... 1.8.1torch version
 .................... torch cuda version1.8.1 
............... 11.1torch cuda version
 nvcc version...............  .....................11.1 
DeepSpeed general environment info:
11.2nvcc version
 deepspeed install path.....................  ...........11.2 
deepspeed install path['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
........... deepspeed info ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']...................
 0.4.2+bc17042, bc17042, big-sciencedeepspeed info
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch version
 .................... torch version1.8.1 
.................... torch cuda version1.8.1 
 ...................deepspeed wheel compiled w.  0.4.2+bc17042, bc17042, big-science......
 torch 1.8, cuda 11.1deepspeed wheel compiled w.
 ...... torch 1.8, cuda 11.1
............... torch cuda version11.1 
...............nvcc version  11.1.....................
 nvcc version11.2 
.....................deepspeed install path  11.2...........
 deepspeed install path ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']...........
 deepspeed info ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']...................
 deepspeed info0.4.2+bc17042, bc17042, big-science 
...................deepspeed wheel compiled w.  0.4.2+bc17042, bc17042, big-science......
 deepspeed wheel compiled w.torch 1.8, cuda 11.1 
/bin/sh: line 0: type: git: not found
...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
DeepSpeed general environment info:
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
torch cuda version ............... 11.1
nvcc version ..................... 11.2
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
async_io ............... [93m[NO][0m ....... [93m[NO][0m
DeepSpeed general environment info:
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
torch cuda version ............... 11.1
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
nvcc version ..................... 11.2
--------------------------------------------------
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
JIT compiled ops requires ninja
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
DeepSpeed general environment info:
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
torch cuda version ............... 11.1
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
nvcc version ..................... 11.2
--------------------------------------------------
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ................... ...................0.4.2+bc17042, bc17042, big-science 
0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w.
 deepspeed wheel compiled w.......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------

op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
op name ................ installed .. compatible
--------------------------------------------------
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
op name ................ installed .. compatible
--------------------------------------------------
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
JIT compiled ops requires ninja
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
torch version .................... 1.8.1
torch cuda version ............... 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninja --------------------------------------------------.................. [92m[OKAY][0m

DeepSpeed C++/CUDA extension op report--------------------------------------------------

--------------------------------------------------op name
 NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.................
 --------------------------------------------------installed
 JIT compiled ops requires ninja..
 compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
DeepSpeed general environment info:DeepSpeed general environment info:torch cuda version 
............... 
11.1
nvcc version torch install path.....................torch install path  11.2...............
  deepspeed install path...............  ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']deepspeed info
 ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']...................
torch version  0.4.2+bc17042, bc17042, big-sciencetorch version....................
  deepspeed wheel compiled w.....................1.8.1  
......1.8.1 
torch cuda versiontorch 1.8, cuda 11.1 
torch cuda version...............  ...............11.1 
11.1nvcc version
 nvcc version.....................  .....................11.2 
11.2deepspeed install path
 deepspeed install path...........  ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... DeepSpeed general environment info:0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. 
...... torch 1.8, cuda 11.1
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
DeepSpeed general environment info:
torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ....................
 1.8.1
torch version ....................torch cuda version  ...............1.8.1 
11.1
torch cuda versionnvcc version  ....................................  11.211.1

deepspeed install pathnvcc version  ................................  11.2['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed install pathdeepspeed info  ..............................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']0.4.2+bc17042, bc17042, big-science

deepspeed infodeepspeed wheel compiled w.  .........................  0.4.2+bc17042, bc17042, big-sciencetorch 1.8, cuda 11.1

deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
DeepSpeed general environment info:
async_io ............... [93m[NO][0m ....... [93m[NO][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
torch version .................... 1.8.1
torch cuda version ............... 11.1
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
--------------------------------------------------
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info ...................  ...................0.4.2+bc17042, bc17042, big-science 
0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w.
 deepspeed wheel compiled w.......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version ...............  ...............11.1 
11.1nvcc version
 nvcc version.....................  .....................11.2 
11.2deepspeed install path
 deepspeed install path...........  ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info
 deepspeed info...................  ...................0.4.2+bc17042, bc17042, big-science 
0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w.
 deepspeed wheel compiled w.......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
ninja .................. [92m[OKAY][0m
torch cuda version ............... 11.1
--------------------------------------------------
nvcc version ..................... 11.2
op name ................ installed .. compatible
--------------------------------------------------
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1

deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:DeepSpeed general environment info:

torch install path torch install path...............  ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version torch version....................  ....................1.8.1 
DeepSpeed general environment info:
1.8.1
torch cuda version torch cuda version...............  ...............11.1 
11.1nvcc version
 .....................nvcc version  11.2.....................
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
 deepspeed install path11.2 
...........deepspeed install path  ...........['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
...................deepspeed info  0.4.2+bc17042, bc17042, big-science...................
 deepspeed wheel compiled w.0.4.2+bc17042, bc17042, big-science 
......deepspeed wheel compiled w.  torch 1.8, cuda 11.1......
 torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
ninja .................. [92m[OKAY][0m
--------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

quantizer quantizer..............  ..............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

async_io ............... [93m[NO][0m ....... [93m[NO][0m
nvcc versionnvcc version  ..........................................  11.211.2

transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w.
 deepspeed wheel compiled w.......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
async_io-------------------------------------------------- 
............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
/bin/sh: line 0: type: git: not found
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

/bin/sh: line 0: type: git: not found
async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inference transformer_inference..  ..[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
utils utils..................  [92m[YES][0m..................  ......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
quantizer .............. [93m[NO][0m quantizer.......  ..............[92m[OKAY][0m 
[93m[NO][0m .......-------------------------------------------------- 
[92m[OKAY][0m
--------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0mninja .......  ..................[92m[OKAY][0m 
[92m[OKAY][0m
transformer ............ [93m[NO][0m --------------------------------------------------.......
 [92m[OKAY][0mop name
 ................ installedstochastic_transformer  .. .compatible 
[93m[NO][0m-------------------------------------------------- 
....... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
DeepSpeed general environment info:
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... DeepSpeed general environment info:
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch install pathtorch version  ...................................  1.8.1
torch cuda version ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']11.1

nvcc versiontorch version  .........................................  11.21.8.1

deepspeed install path torch cuda version...........  ............... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']11.1

deepspeed infonvcc version  ........................................  0.4.2+bc17042, bc17042, big-science11.2

deepspeed wheel compiled w.deepspeed install path  .................  torch 1.8, cuda 11.1
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
DeepSpeed general environment info:torch install path
 ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']....................
 1.8.1
torch version ....................torch cuda version  1.8.1...............
 11.1
torch cuda versionnvcc version  ....................................  11.111.2

nvcc versiondeepspeed install path  ................................  11.2
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed install path
 ...........deepspeed info  ................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']0.4.2+bc17042, bc17042, big-science

deepspeed infodeepspeed wheel compiled w.  .........................  0.4.2+bc17042, bc17042, big-sciencetorch 1.8, cuda 11.1

deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inference ..transformer_inference  [93m[NO][0m..  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
utils ..................utils  [92m[YES][0m..................  ......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
quantizer ..............quantizer  [93m[NO][0m..............  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path ...............  ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version torch version....................  ....................1.8.1 
1.8.1
torch cuda version torch cuda version...............  ...............11.1 
11.1nvcc version
 nvcc version.....................  .....................11.2 
11.2deepspeed install path
 deepspeed install path...........  ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
async_ioutils  .................................  [93m[NO][0m[92m[YES][0m  .............  [93m[NO][0m[92m[OKAY][0m

quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference-------------------------------------------------- 
.. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science
0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w.
 deepspeed wheel compiled w.......  torch 1.8, cuda 11.1......
 torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
DeepSpeed general environment info:torch install path 
............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version ....................['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 1.8.1

torch versiontorch cuda version  ...................................  1.8.111.1

nvcc versiontorch cuda version  ....................................  11.211.1

deepspeed install pathnvcc version  ................................  11.2['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed install path deepspeed info...........  ................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']0.4.2+bc17042, bc17042, big-science

deepspeed infodeepspeed wheel compiled w.  .........................  torch 1.8, cuda 11.1
0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0mutils  .........................  [92m[OKAY][0m[92m[YES][0m
 ...... [92m[OKAY][0m
utils quantizer..................  ..............[92m[YES][0m  [93m[NO][0m......  ....... [92m[OKAY][0m
[92m[OKAY][0m
--------------------------------------------------
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w.DeepSpeed general environment info: ...... torch 1.8, cuda 11.1

torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
using world size: 256, data-parallel-size: 8, tensor-model-parallel size: 4, pipeline-model-parallel size: 8 
using torch.float16 for parameters ...
------------------------ arguments ------------------------
  accumulate_allreduce_grads_in_fp32 .............. False
  adam_beta1 ...................................... 0.9
  adam_beta2 ...................................... 0.999
  adam_eps ........................................ 1e-08
  adlr_autoresume ................................. False
  adlr_autoresume_interval ........................ 1000
  apply_query_key_layer_scaling ................... True
  apply_residual_connection_post_layernorm ........ False
  attention_dropout ............................... 0.1
  attention_softmax_in_fp32 ....................... False
  bert_binary_head ................................ True
  bert_load ....................................... None
  bf16 ............................................ False
  bias_dropout_fusion ............................. True
  bias_gelu_fusion ................................ True
  biencoder_projection_dim ........................ 0
  biencoder_shared_query_context_model ............ False
  block_data_path ................................. None
  checkpoint_activations .......................... True
  checkpoint_in_cpu ............................... False
  checkpoint_num_layers ........................... 1
  clip_grad ....................................... 1.0
  codecarbon_dir .................................. /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/tr8-104B-logs/codecarbon
  consumed_train_samples .......................... 0
  consumed_valid_samples .......................... 0
  contigious_checkpointing ........................ False
  cpu_optimizer ................................... False
  cpu_torch_adam .................................. False
  data_impl ....................................... mmap
  data_parallel_size .............................. 8
  data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document']
  dataloader_type ................................. single
  DDP_impl ........................................ local
  decoder_seq_length .............................. None
  deepscale ....................................... False
  deepscale_config ................................ None
  deepspeed ....................................... True
  deepspeed_activation_checkpointing .............. True
  deepspeed_config ................................ ./ds_config.1186600.json
  deepspeed_mpi ................................... False
  distribute_checkpointed_activations ............. False
  distributed_backend ............................. nccl
  embedding_path .................................. None
  encoder_seq_length .............................. 2048
  eod_mask_loss ................................... False
  eval_interval ................................... 1000
  eval_iters ...................................... 5
  evidence_data_path .............................. None
  exit_duration_in_mins ........................... 1190
  exit_interval ................................... None
  ffn_hidden_size ................................. 20480
  finetune ........................................ False
  fp16 ............................................ True
  fp16_lm_cross_entropy ........................... False
  fp32_residual_connection ........................ False
  global_batch_size ............................... 2048
  hidden_dropout .................................. 0.1
  hidden_size ..................................... 16384
  hysteresis ...................................... 2
  ict_head_size ................................... None
  ict_load ........................................ None
  img_dim ......................................... 224
  indexer_batch_size .............................. 128
  indexer_log_interval ............................ 1000
  init_method_std ................................. 0.02
  init_method_xavier_uniform ...................... False
  initial_loss_scale .............................. 4294967296
  kv_channels ..................................... 512
  layernorm_epsilon ............................... 1e-05
  lazy_mpu_init ................................... None
  load ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints
  local_rank ...................................... 0
  log_batch_size_to_tensorboard ................... True
  log_interval .................................... 10
  log_learning_rate_to_tensorboard ................ True
  log_loss_scale_to_tensorboard ................... True
  log_num_zeros_in_grad ........................... False
  log_params_norm ................................. False
  log_timers_to_tensorboard ....................... True
  log_validation_ppl_to_tensorboard ............... True
  loss_scale ...................................... 12.0
  loss_scale_window ............................... 1000
  lr .............................................. 6e-05
  lr_decay_iters .................................. None
  lr_decay_samples ................................ 126953125
  lr_decay_style .................................. cosine
  lr_warmup_fraction .............................. None
  lr_warmup_iters ................................. 0
  lr_warmup_samples ............................... 216320
  make_vocab_size_divisible_by .................... 128
  mask_prob ....................................... 0.15
  masked_softmax_fusion ........................... True
  max_position_embeddings ......................... 2048
  memory_centric_tiled_linear ..................... False
  merge_file ...................................... /gpfswork/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/data/gpt2-merges.txt
  micro_batch_size ................................ 1
  min_loss_scale .................................. 1.0
  min_lr .......................................... 6e-06
  mmap_warmup ..................................... False
  no_load_optim ................................... None
  no_load_rng ..................................... None
  no_save_optim ................................... None
  no_save_rng ..................................... None
  num_attention_heads ............................. 32
  num_channels .................................... 3
  num_classes ..................................... 1000
  num_layers ...................................... 32
  num_layers_per_virtual_pipeline_stage ........... None
  num_workers ..................................... 2
  onnx_safe ....................................... None
  openai_gelu ..................................... False
  optimizer ....................................... adam
  override_lr_scheduler ........................... False
  params_dtype .................................... torch.float16
  partition_activations ........................... False
  patch_dim ....................................... 16
  pipeline_model_parallel_size .................... 8
  position_embedding_type ......................... PositionEmbeddingType.absolute
  profile_backward ................................ False
  query_in_block_prob ............................. 0.1
  rampup_batch_size ............................... ['16', '16', '6_000_000']
  rank ............................................ 0
  remote_device ................................... none
  reset_attention_mask ............................ False
  reset_position_ids .............................. False
  retriever_report_topk_accuracies ................ []
  retriever_score_scaling ......................... False
  retriever_seq_length ............................ 256
  sample_rate ..................................... 1.0
  save ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints
  save_interval ................................... 1500
  scatter_gather_tensors_in_pipeline .............. True
  scattered_embeddings ............................ False
  seed ............................................ 42
  seq_length ...................................... 2048
  sgd_momentum .................................... 0.9
  short_seq_prob .................................. 0.1
  split ........................................... 949,50,1
  split_transformers .............................. False
  synchronize_each_layer .......................... False
  tensor_model_parallel_size ...................... 4
  tensorboard_dir ................................. /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/tr8-104B-logs/tensorboard
  tensorboard_log_interval ........................ 1
  tensorboard_queue_size .......................... 5
  tile_factor ..................................... 1
  titles_data_path ................................ None
  tokenizer_name_or_path .......................... None
  tokenizer_type .................................. GPT2BPETokenizer
  train_iters ..................................... None
  train_samples ................................... 300000000
  use_checkpoint_lr_scheduler ..................... False
  use_contiguous_buffers_in_ddp ................... False
  use_cpu_initialization .......................... None
  use_one_sent_docs ............................... False
  use_pin_memory .................................. False
  virtual_pipeline_model_parallel_size ............ None
  vocab_extra_ids ................................. 0
  vocab_file ...................................... /gpfswork/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/data/gpt2-vocab.json
  weight_decay .................................... 0.1
  world_size ...................................... 256
  zero_allgather_bucket_size ...................... 0.0
  zero_contigious_gradients ....................... False
  zero_reduce_bucket_size ......................... 0.0
  zero_reduce_scatter ............................. False
  zero_stage ...................................... 1
-------------------- end of arguments ---------------------
will use batch size rampup starting from global batch size 16 to global batch size 2048 with batch size increments 16 over 6000000 samples.
> building GPT2BPETokenizer tokenizer ...
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
 > padded vocab (size: 50257) with 431 dummy tokens (new size: 50688)
> setting tensorboard ...
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version ....................DeepSpeed general environment info: 1.8.1

torch cuda version ............... 11.1
torch install pathnvcc version  ....................................  11.2
deepspeed install path ........... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infotorch version  .......................................  0.4.2+bc17042, bc17042, big-science1.8.1

deepspeed wheel compiled w. ......torch cuda version  torch 1.8, cuda 11.1...............
 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
> setting codecarbon ...
> initializing torch distributed ...
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report--------------------------------------------------
--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report--------------------------------------------------

----------------------------------------------------------------------------------------------------


JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------

----------------------------------------------------------------------------------------------------


DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja


----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja

--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------


op nameop nameop name  op name................ ................  ................ installed................  installed  installedinstalled....    ....compatiblecompatible  

compatiblecompatible----------------------------------------------------------------------------------------------------


----------------------------------------------------------------------------------------------------

cpu_adamcpu_adamcpu_adamcpu_adam    ............................................................    [92m[YES][0m[92m[YES][0m[92m[YES][0m[92m[YES][0m    .................. ......   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


[92m[OKAY][0m
fused_adam fused_adam.............fused_adam fused_adam   [93m[NO][0m.......................... .............  [93m[NO][0m [93m[NO][0m[93m[NO][0m.......    ..............[92m[OKAY][0m.......  
[92m[OKAY][0m[92m[OKAY][0m 

[92m[OKAY][0mfused_lamb
 .............fused_lamb  fused_lambfused_lamb[93m[NO][0m.............    .................................[93m[NO][0m    [93m[NO][0m[92m[OKAY][0m[93m[NO][0m....... 
 ....... [92m[OKAY][0m .......
[92m[OKAY][0m 
[92m[OKAY][0m
sparse_attn ............ [93m[NO][0m .......sparse_attn  [92m[OKAY][0m............
sparse_attn sparse_attn [93m[NO][0mtransformer ............  ............ ............ [93m[NO][0m....... [93m[NO][0m[93m[NO][0m    .......[92m[OKAY][0m..............  
 [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m

transformer
 ............transformertransformer   stochastic_transformer[93m[NO][0m........................  .......  . [93m[NO][0m[93m[NO][0m  [92m[OKAY][0m[93m[NO][0m....... 
  ..............[92m[OKAY][0m  
[92m[OKAY][0m[92m[OKAY][0mstochastic_transformer
 
.stochastic_transformer  stochastic_transformer[93m[NO][0m  ........ . [93m[NO][0m [92m[OKAY][0m [93m[NO][0m.......
  .......[92m[OKAY][0m 
[92m[OKAY][0m
----------------------------------------------------------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op report--------------------------------------------------


DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report


----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.----------------------------------------------------------------------------------------------------


----------------------------------------------------------------------------------------------------JIT compiled ops requires ninja


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninjaJIT compiled ops requires ninja


--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


--------------------------------------------------
----------------------------------------------------------------------------------------------------
--------------------------------------------------


op nameop nameop name   op name................................................    ................installedinstalledinstalled    installed......    compatible..compatiblecompatible
 

--------------------------------------------------compatible----------------------------------------------------------------------------------------------------


--------------------------------------------------
cpu_adamcpu_adamcpu_adam  ............... ............... ...............[92m[YES][0mcpu_adam  [92m[YES][0m [92m[YES][0m ......  ............... ............ [92m[OKAY][0m  [92m[YES][0m
[92m[OKAY][0m[92m[OKAY][0m 

...... [92m[OKAY][0m
fused_adam .............fused_adamfused_adam   fused_adam[93m[NO][0m.............  ............. .................... [93m[NO][0m  [93m[NO][0m[93m[NO][0m [92m[OKAY][0m  .......
..............  [92m[OKAY][0m fused_lamb[92m[OKAY][0m[92m[OKAY][0m
 

............. [93m[NO][0mfused_lambfused_lambfused_lamb    ..............................................    [92m[OKAY][0m[93m[NO][0m[93m[NO][0m[93m[NO][0m
   .....................   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


sparse_attn ............ [93m[NO][0m ....... sparse_attnsparse_attnsparse_attn[92m[OKAY][0m  
........................   ............[93m[NO][0m[93m[NO][0m   [93m[NO][0mtransformer..............   ...................[92m[OKAY][0m  [92m[OKAY][0m[92m[OKAY][0m 

[93m[NO][0m
 transformertransformer.......transformer  ............  ............ [92m[OKAY][0m ............
[93m[NO][0m[93m[NO][0m   [93m[NO][0m..............  stochastic_transformer....... [92m[OKAY][0m  [92m[OKAY][0m
[92m[OKAY][0m
.
 [93m[NO][0mstochastic_transformerstochastic_transformer stochastic_transformer  ........   .[92m[OKAY][0m.[93m[NO][0m  
 [93m[NO][0m[93m[NO][0m ..............   .......[92m[OKAY][0m[92m[OKAY][0m 

[92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.transformer_inference
 .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m async_io......  ...............[92m[OKAY][0m 
[93m[NO][0m ....... [93m[NO][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... DeepSpeed general environment info:0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. 
...... torch 1.8, cuda 11.1
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
> initializing tensor model parallel with size 4
> initializing pipeline model parallel with size 8
> setting random seeds to 42 ...
[2021-09-25 04:27:14,118] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 2760 and data parallel seed: 42
> compiling dataset index builder ...
make: Entering directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/data'
make: Nothing to be done for 'default'.
make: Leaving directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/data'
>>> done with dataset index builder. Compilation time: 0.302 seconds
> compiling and loading fused kernels ...
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
Detected CUDA files, patching ldflags
Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/fused_kernels/build/build.ninja...
Building extension module scaled_upper_triang_masked_softmax_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module scaled_upper_triang_masked_softmax_cuda...
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
Detected CUDA files, patching ldflags
Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/fused_kernels/build/build.ninja...
Building extension module scaled_masked_softmax_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module scaled_masked_softmax_cuda...
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
Detected CUDA files, patching ldflags
Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/fused_kernels/build/build.ninja...
Building extension module fused_mix_prec_layer_norm_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module fused_mix_prec_layer_norm_cuda...
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
>>> done with compiling and loading fused kernels. Compilation time: 23.417 seconds
time to initialize megatron (seconds): 25.843
[after megatron is initialized] datetime: 2021-09-25 04:27:37 
building GPT model ...
[2021-09-25 04:27:37,906] [INFO] [utils.py:680:see_memory_usage] Before Building Model
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/cuda/memory.py:373: FutureWarning: torch.cuda.memory_cached has been renamed to torch.cuda.memory_reserved
  warnings.warn(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/cuda/memory.py:381: FutureWarning: torch.cuda.max_memory_cached has been renamed to torch.cuda.max_memory_reserved
  warnings.warn(
[2021-09-25 04:27:37,908] [INFO] [utils.py:681:see_memory_usage] MA 0.0 GB         Max_MA 0.0 GB         CA 0.0 GB         Max_CA 0 GB 
[2021-09-25 04:27:37,908] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory:  used = 36.69 GB, percent = 19.6%
SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None
Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=0, model=1): 1, ProcessCoord(pipe=0, data=0, model=2): 2, ProcessCoord(pipe=0, data=0, model=3): 3, ProcessCoord(pipe=0, data=1, model=0): 4, ProcessCoord(pipe=0, data=1, model=1): 5, ProcessCoord(pipe=0, data=1, model=2): 6, ProcessCoord(pipe=0, data=1, model=3): 7, ProcessCoord(pipe=0, data=2, model=0): 8, ProcessCoord(pipe=0, data=2, model=1): 9, ProcessCoord(pipe=0, data=2, model=2): 10, ProcessCoord(pipe=0, data=2, model=3): 11, ProcessCoord(pipe=0, data=3, model=0): 12, ProcessCoord(pipe=0, data=3, model=1): 13, ProcessCoord(pipe=0, data=3, model=2): 14, ProcessCoord(pipe=0, data=3, model=3): 15, ProcessCoord(pipe=0, data=4, model=0): 16, ProcessCoord(pipe=0, data=4, model=1): 17, ProcessCoord(pipe=0, data=4, model=2): 18, ProcessCoord(pipe=0, data=4, model=3): 19, ProcessCoord(pipe=0, data=5, model=0): 20, ProcessCoord(pipe=0, data=5, model=1): 21, ProcessCoord(pipe=0, data=5, model=2): 22, ProcessCoord(pipe=0, data=5, model=3): 23, ProcessCoord(pipe=0, data=6, model=0): 24, ProcessCoord(pipe=0, data=6, model=1): 25, ProcessCoord(pipe=0, data=6, model=2): 26, ProcessCoord(pipe=0, data=6, model=3): 27, ProcessCoord(pipe=0, data=7, model=0): 28, ProcessCoord(pipe=0, data=7, model=1): 29, ProcessCoord(pipe=0, data=7, model=2): 30, ProcessCoord(pipe=0, data=7, model=3): 31, ProcessCoord(pipe=1, data=0, model=0): 32, ProcessCoord(pipe=1, data=0, model=1): 33, ProcessCoord(pipe=1, data=0, model=2): 34, ProcessCoord(pipe=1, data=0, model=3): 35, ProcessCoord(pipe=1, data=1, model=0): 36, ProcessCoord(pipe=1, data=1, model=1): 37, ProcessCoord(pipe=1, data=1, model=2): 38, ProcessCoord(pipe=1, data=1, model=3): 39, ProcessCoord(pipe=1, data=2, model=0): 40, ProcessCoord(pipe=1, data=2, model=1): 41, ProcessCoord(pipe=1, data=2, model=2): 42, ProcessCoord(pipe=1, data=2, model=3): 43, ProcessCoord(pipe=1, data=3, model=0): 44, ProcessCoord(pipe=1, data=3, model=1): 45, ProcessCoord(pipe=1, data=3, model=2): 46, ProcessCoord(pipe=1, data=3, model=3): 47, ProcessCoord(pipe=1, data=4, model=0): 48, ProcessCoord(pipe=1, data=4, model=1): 49, ProcessCoord(pipe=1, data=4, model=2): 50, ProcessCoord(pipe=1, data=4, model=3): 51, ProcessCoord(pipe=1, data=5, model=0): 52, ProcessCoord(pipe=1, data=5, model=1): 53, ProcessCoord(pipe=1, data=5, model=2): 54, ProcessCoord(pipe=1, data=5, model=3): 55, ProcessCoord(pipe=1, data=6, model=0): 56, ProcessCoord(pipe=1, data=6, model=1): 57, ProcessCoord(pipe=1, data=6, model=2): 58, ProcessCoord(pipe=1, data=6, model=3): 59, ProcessCoord(pipe=1, data=7, model=0): 60, ProcessCoord(pipe=1, data=7, model=1): 61, ProcessCoord(pipe=1, data=7, model=2): 62, ProcessCoord(pipe=1, data=7, model=3): 63, ProcessCoord(pipe=2, data=0, model=0): 64, ProcessCoord(pipe=2, data=0, model=1): 65, ProcessCoord(pipe=2, data=0, model=2): 66, ProcessCoord(pipe=2, data=0, model=3): 67, ProcessCoord(pipe=2, data=1, model=0): 68, ProcessCoord(pipe=2, data=1, model=1): 69, ProcessCoord(pipe=2, data=1, model=2): 70, ProcessCoord(pipe=2, data=1, model=3): 71, ProcessCoord(pipe=2, data=2, model=0): 72, ProcessCoord(pipe=2, data=2, model=1): 73, ProcessCoord(pipe=2, data=2, model=2): 74, ProcessCoord(pipe=2, data=2, model=3): 75, ProcessCoord(pipe=2, data=3, model=0): 76, ProcessCoord(pipe=2, data=3, model=1): 77, ProcessCoord(pipe=2, data=3, model=2): 78, ProcessCoord(pipe=2, data=3, model=3): 79, ProcessCoord(pipe=2, data=4, model=0): 80, ProcessCoord(pipe=2, data=4, model=1): 81, ProcessCoord(pipe=2, data=4, model=2): 82, ProcessCoord(pipe=2, data=4, model=3): 83, ProcessCoord(pipe=2, data=5, model=0): 84, ProcessCoord(pipe=2, data=5, model=1): 85, ProcessCoord(pipe=2, data=5, model=2): 86, ProcessCoord(pipe=2, data=5, model=3): 87, ProcessCoord(pipe=2, data=6, model=0): 88, ProcessCoord(pipe=2, data=6, model=1): 89, ProcessCoord(pipe=2, data=6, model=2): 90, ProcessCoord(pipe=2, data=6, model=3): 91, ProcessCoord(pipe=2, data=7, model=0): 92, ProcessCoord(pipe=2, data=7, model=1): 93, ProcessCoord(pipe=2, data=7, model=2): 94, ProcessCoord(pipe=2, data=7, model=3): 95, ProcessCoord(pipe=3, data=0, model=0): 96, ProcessCoord(pipe=3, data=0, model=1): 97, ProcessCoord(pipe=3, data=0, model=2): 98, ProcessCoord(pipe=3, data=0, model=3): 99, ProcessCoord(pipe=3, data=1, model=0): 100, ProcessCoord(pipe=3, data=1, model=1): 101, ProcessCoord(pipe=3, data=1, model=2): 102, ProcessCoord(pipe=3, data=1, model=3): 103, ProcessCoord(pipe=3, data=2, model=0): 104, ProcessCoord(pipe=3, data=2, model=1): 105, ProcessCoord(pipe=3, data=2, model=2): 106, ProcessCoord(pipe=3, data=2, model=3): 107, ProcessCoord(pipe=3, data=3, model=0): 108, ProcessCoord(pipe=3, data=3, model=1): 109, ProcessCoord(pipe=3, data=3, model=2): 110, ProcessCoord(pipe=3, data=3, model=3): 111, ProcessCoord(pipe=3, data=4, model=0): 112, ProcessCoord(pipe=3, data=4, model=1): 113, ProcessCoord(pipe=3, data=4, model=2): 114, ProcessCoord(pipe=3, data=4, model=3): 115, ProcessCoord(pipe=3, data=5, model=0): 116, ProcessCoord(pipe=3, data=5, model=1): 117, ProcessCoord(pipe=3, data=5, model=2): 118, ProcessCoord(pipe=3, data=5, model=3): 119, ProcessCoord(pipe=3, data=6, model=0): 120, ProcessCoord(pipe=3, data=6, model=1): 121, ProcessCoord(pipe=3, data=6, model=2): 122, ProcessCoord(pipe=3, data=6, model=3): 123, ProcessCoord(pipe=3, data=7, model=0): 124, ProcessCoord(pipe=3, data=7, model=1): 125, ProcessCoord(pipe=3, data=7, model=2): 126, ProcessCoord(pipe=3, data=7, model=3): 127, ProcessCoord(pipe=4, data=0, model=0): 128, ProcessCoord(pipe=4, data=0, model=1): 129, ProcessCoord(pipe=4, data=0, model=2): 130, ProcessCoord(pipe=4, data=0, model=3): 131, ProcessCoord(pipe=4, data=1, model=0): 132, ProcessCoord(pipe=4, data=1, model=1): 133, ProcessCoord(pipe=4, data=1, model=2): 134, ProcessCoord(pipe=4, data=1, model=3): 135, ProcessCoord(pipe=4, data=2, model=0): 136, ProcessCoord(pipe=4, data=2, model=1): 137, ProcessCoord(pipe=4, data=2, model=2): 138, ProcessCoord(pipe=4, data=2, model=3): 139, ProcessCoord(pipe=4, data=3, model=0): 140, ProcessCoord(pipe=4, data=3, model=1): 141, ProcessCoord(pipe=4, data=3, model=2): 142, ProcessCoord(pipe=4, data=3, model=3): 143, ProcessCoord(pipe=4, data=4, model=0): 144, ProcessCoord(pipe=4, data=4, model=1): 145, ProcessCoord(pipe=4, data=4, model=2): 146, ProcessCoord(pipe=4, data=4, model=3): 147, ProcessCoord(pipe=4, data=5, model=0): 148, ProcessCoord(pipe=4, data=5, model=1): 149, ProcessCoord(pipe=4, data=5, model=2): 150, ProcessCoord(pipe=4, data=5, model=3): 151, ProcessCoord(pipe=4, data=6, model=0): 152, ProcessCoord(pipe=4, data=6, model=1): 153, ProcessCoord(pipe=4, data=6, model=2): 154, ProcessCoord(pipe=4, data=6, model=3): 155, ProcessCoord(pipe=4, data=7, model=0): 156, ProcessCoord(pipe=4, data=7, model=1): 157, ProcessCoord(pipe=4, data=7, model=2): 158, ProcessCoord(pipe=4, data=7, model=3): 159, ProcessCoord(pipe=5, data=0, model=0): 160, ProcessCoord(pipe=5, data=0, model=1): 161, ProcessCoord(pipe=5, data=0, model=2): 162, ProcessCoord(pipe=5, data=0, model=3): 163, ProcessCoord(pipe=5, data=1, model=0): 164, ProcessCoord(pipe=5, data=1, model=1): 165, ProcessCoord(pipe=5, data=1, model=2): 166, ProcessCoord(pipe=5, data=1, model=3): 167, ProcessCoord(pipe=5, data=2, model=0): 168, ProcessCoord(pipe=5, data=2, model=1): 169, ProcessCoord(pipe=5, data=2, model=2): 170, ProcessCoord(pipe=5, data=2, model=3): 171, ProcessCoord(pipe=5, data=3, model=0): 172, ProcessCoord(pipe=5, data=3, model=1): 173, ProcessCoord(pipe=5, data=3, model=2): 174, ProcessCoord(pipe=5, data=3, model=3): 175, ProcessCoord(pipe=5, data=4, model=0): 176, ProcessCoord(pipe=5, data=4, model=1): 177, ProcessCoord(pipe=5, data=4, model=2): 178, ProcessCoord(pipe=5, data=4, model=3): 179, ProcessCoord(pipe=5, data=5, model=0): 180, ProcessCoord(pipe=5, data=5, model=1): 181, ProcessCoord(pipe=5, data=5, model=2): 182, ProcessCoord(pipe=5, data=5, model=3): 183, ProcessCoord(pipe=5, data=6, model=0): 184, ProcessCoord(pipe=5, data=6, model=1): 185, ProcessCoord(pipe=5, data=6, model=2): 186, ProcessCoord(pipe=5, data=6, model=3): 187, ProcessCoord(pipe=5, data=7, model=0): 188, ProcessCoord(pipe=5, data=7, model=1): 189, ProcessCoord(pipe=5, data=7, model=2): 190, ProcessCoord(pipe=5, data=7, model=3): 191, ProcessCoord(pipe=6, data=0, model=0): 192, ProcessCoord(pipe=6, data=0, model=1): 193, ProcessCoord(pipe=6, data=0, model=2): 194, ProcessCoord(pipe=6, data=0, model=3): 195, ProcessCoord(pipe=6, data=1, model=0): 196, ProcessCoord(pipe=6, data=1, model=1): 197, ProcessCoord(pipe=6, data=1, model=2): 198, ProcessCoord(pipe=6, data=1, model=3): 199, ProcessCoord(pipe=6, data=2, model=0): 200, ProcessCoord(pipe=6, data=2, model=1): 201, ProcessCoord(pipe=6, data=2, model=2): 202, ProcessCoord(pipe=6, data=2, model=3): 203, ProcessCoord(pipe=6, data=3, model=0): 204, ProcessCoord(pipe=6, data=3, model=1): 205, ProcessCoord(pipe=6, data=3, model=2): 206, ProcessCoord(pipe=6, data=3, model=3): 207, ProcessCoord(pipe=6, data=4, model=0): 208, ProcessCoord(pipe=6, data=4, model=1): 209, ProcessCoord(pipe=6, data=4, model=2): 210, ProcessCoord(pipe=6, data=4, model=3): 211, ProcessCoord(pipe=6, data=5, model=0): 212, ProcessCoord(pipe=6, data=5, model=1): 213, ProcessCoord(pipe=6, data=5, model=2): 214, ProcessCoord(pipe=6, data=5, model=3): 215, ProcessCoord(pipe=6, data=6, model=0): 216, ProcessCoord(pipe=6, data=6, model=1): 217, ProcessCoord(pipe=6, data=6, model=2): 218, ProcessCoord(pipe=6, data=6, model=3): 219, ProcessCoord(pipe=6, data=7, model=0): 220, ProcessCoord(pipe=6, data=7, model=1): 221, ProcessCoord(pipe=6, data=7, model=2): 222, ProcessCoord(pipe=6, data=7, model=3): 223, ProcessCoord(pipe=7, data=0, model=0): 224, ProcessCoord(pipe=7, data=0, model=1): 225, ProcessCoord(pipe=7, data=0, model=2): 226, ProcessCoord(pipe=7, data=0, model=3): 227, ProcessCoord(pipe=7, data=1, model=0): 228, ProcessCoord(pipe=7, data=1, model=1): 229, ProcessCoord(pipe=7, data=1, model=2): 230, ProcessCoord(pipe=7, data=1, model=3): 231, ProcessCoord(pipe=7, data=2, model=0): 232, ProcessCoord(pipe=7, data=2, model=1): 233, ProcessCoord(pipe=7, data=2, model=2): 234, ProcessCoord(pipe=7, data=2, model=3): 235, ProcessCoord(pipe=7, data=3, model=0): 236, ProcessCoord(pipe=7, data=3, model=1): 237, ProcessCoord(pipe=7, data=3, model=2): 238, ProcessCoord(pipe=7, data=3, model=3): 239, ProcessCoord(pipe=7, data=4, model=0): 240, ProcessCoord(pipe=7, data=4, model=1): 241, ProcessCoord(pipe=7, data=4, model=2): 242, ProcessCoord(pipe=7, data=4, model=3): 243, ProcessCoord(pipe=7, data=5, model=0): 244, ProcessCoord(pipe=7, data=5, model=1): 245, ProcessCoord(pipe=7, data=5, model=2): 246, ProcessCoord(pipe=7, data=5, model=3): 247, ProcessCoord(pipe=7, data=6, model=0): 248, ProcessCoord(pipe=7, data=6, model=1): 249, ProcessCoord(pipe=7, data=6, model=2): 250, ProcessCoord(pipe=7, data=6, model=3): 251, ProcessCoord(pipe=7, data=7, model=0): 252, ProcessCoord(pipe=7, data=7, model=1): 253, ProcessCoord(pipe=7, data=7, model=2): 254, ProcessCoord(pipe=7, data=7, model=3): 255}
[2021-09-25 04:27:39,312] [INFO] [module.py:360:_partition_layers] Partitioning pipeline stages with method type:transformer
stage=0 layers=7
     0: _to_float16
     1: EmbeddingPipe
     2: <lambda>
     3: ParallelTransformerLayerPipe
     4: ParallelTransformerLayerPipe
     5: ParallelTransformerLayerPipe
     6: ParallelTransformerLayerPipe
stage=1 layers=4
     7: ParallelTransformerLayerPipe
     8: ParallelTransformerLayerPipe
     9: ParallelTransformerLayerPipe
    10: ParallelTransformerLayerPipe
stage=2 layers=4
    11: ParallelTransformerLayerPipe
    12: ParallelTransformerLayerPipe
    13: ParallelTransformerLayerPipe
    14: ParallelTransformerLayerPipe
stage=3 layers=4
    15: ParallelTransformerLayerPipe
    16: ParallelTransformerLayerPipe
    17: ParallelTransformerLayerPipe
    18: ParallelTransformerLayerPipe
stage=4 layers=4
    19: ParallelTransformerLayerPipe
    20: ParallelTransformerLayerPipe
    21: ParallelTransformerLayerPipe
    22: ParallelTransformerLayerPipe
stage=5 layers=4
    23: ParallelTransformerLayerPipe
    24: ParallelTransformerLayerPipe
    25: ParallelTransformerLayerPipe
    26: ParallelTransformerLayerPipe
stage=6 layers=4
    27: ParallelTransformerLayerPipe
    28: ParallelTransformerLayerPipe
    29: ParallelTransformerLayerPipe
    30: ParallelTransformerLayerPipe
stage=7 layers=8
    31: ParallelTransformerLayerPipe
    32: ParallelTransformerLayerPipe
    33: ParallelTransformerLayerPipe
    34: ParallelTransformerLayerPipe
    35: <lambda>
    36: MixedFusedLayerNorm
    37: EmbeddingPipe
    38: float16_to_fp32
  loss: CrossEntropy
 > number of parameters on (tensor, pipeline) model parallel rank (2, 2): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (0, 1): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (0, 2): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (2, 1): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (3, 1): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (1, 1): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (1, 4): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (0, 6): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (0, 5): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (3, 5): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (2, 4): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (1, 5): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (2, 5): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (0, 4): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (3, 4): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (1, 6): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (1, 3): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (2, 6): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (2, 3): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (3, 6): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (3, 3): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (0, 3): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (1, 2): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (3, 2): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (2, 0): 1986465792
 > number of parameters on (tensor, pipeline) model parallel rank (0, 7): 1986498560
 > number of parameters on (tensor, pipeline) model parallel rank (1, 0): 1986465792
 > number of parameters on (tensor, pipeline) model parallel rank (2, 7): 1986498560
 > number of parameters on (tensor, pipeline) model parallel rank (1, 7): 1986498560
 > number of parameters on (tensor, pipeline) model parallel rank (3, 0): 1986465792
 > number of parameters on (tensor, pipeline) model parallel rank (3, 7): 1986498560
[2021-09-25 04:27:40,518] [INFO] [utils.py:680:see_memory_usage] After Building Model
[2021-09-25 04:27:40,519] [INFO] [utils.py:681:see_memory_usage] MA 3.77 GB         Max_MA 3.79 GB         CA 3.79 GB         Max_CA 4 GB 
[2021-09-25 04:27:40,519] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory:  used = 36.87 GB, percent = 19.7%
 > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 1986465792
setting training iterations to 159576
> learning rate decay style: cosine
DeepSpeed is enabled.
[2021-09-25 04:27:40,540] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.4.2+bc17042, git-hash=bc17042, git-branch=big-science
[2021-09-25 04:27:40,690] [INFO] [engine.py:179:__init__] DeepSpeed Flops Profiler Enabled: False
[2021-09-25 04:27:40,690] [INFO] [engine.py:736:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer
[2021-09-25 04:27:40,690] [INFO] [engine.py:741:_configure_optimizer] Using client Optimizer as basic optimizer
[2021-09-25 04:27:40,690] [INFO] [engine.py:750:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam
[2021-09-25 04:27:40,690] [INFO] [utils.py:43:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type=<class 'apex.optimizers.fused_adam.FusedAdam'>
[2021-09-25 04:27:40,690] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer
[2021-09-25 04:27:40,690] [INFO] [stage2.py:106:__init__] Reduce bucket size 500000000
[2021-09-25 04:27:40,690] [INFO] [stage2.py:107:__init__] Allgather bucket size 500000000
[2021-09-25 04:27:40,691] [INFO] [stage2.py:108:__init__] CPU Offload: False
[2021-09-25 04:27:40,691] [INFO] [stage2.py:109:__init__] Round robin gradient partitioning: False
[2021-09-25 04:27:45,267] [INFO] [stage2.py:419:__init__] optimizer state initialized
[2021-09-25 04:27:45,267] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam
[2021-09-25 04:27:45,267] [INFO] [engine.py:553:_configure_lr_scheduler] DeepSpeed using client LR scheduler
[2021-09-25 04:27:45,267] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = <megatron.learning_rates.AnnealingLR object at 0x14e79af10f10>
[2021-09-25 04:27:45,267] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)]
[2021-09-25 04:27:45,267] [INFO] [config.py:900:print] DeepSpeedEngine configuration:
[2021-09-25 04:27:45,268] [INFO] [config.py:904:print]   activation_checkpointing_config  {
    "partition_activations": false, 
    "contiguous_memory_optimization": false, 
    "cpu_checkpointing": false, 
    "number_checkpoints": null, 
    "synchronize_checkpoint_boundary": false, 
    "profile": false
}
[2021-09-25 04:27:45,268] [INFO] [config.py:904:print]   aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}
[2021-09-25 04:27:45,268] [INFO] [config.py:904:print]   allreduce_always_fp32 ........ False
[2021-09-25 04:27:45,268] [INFO] [config.py:904:print]   amp_enabled .................. False
[2021-09-25 04:27:45,268] [INFO] [config.py:904:print]   amp_params ................... False
[2021-09-25 04:27:45,268] [INFO] [config.py:904:print]   checkpoint_tag_validation_enabled  True
[2021-09-25 04:27:45,268] [INFO] [config.py:904:print]   checkpoint_tag_validation_fail  False
[2021-09-25 04:27:45,268] [INFO] [config.py:904:print]   disable_allgather ............ False
[2021-09-25 04:27:45,268] [INFO] [config.py:904:print]   dump_state ................... False
[2021-09-25 04:27:45,268] [INFO] [config.py:904:print]   dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1}
[2021-09-25 04:27:45,268] [INFO] [config.py:904:print]   eigenvalue_enabled ........... False
[2021-09-25 04:27:45,268] [INFO] [config.py:904:print]   eigenvalue_gas_boundary_resolution  1
[2021-09-25 04:27:45,268] [INFO] [config.py:904:print]   eigenvalue_layer_name ........ bert.encoder.layer
[2021-09-25 04:27:45,268] [INFO] [config.py:904:print]   eigenvalue_layer_num ......... 0
[2021-09-25 04:27:45,268] [INFO] [config.py:904:print]   eigenvalue_max_iter .......... 100
[2021-09-25 04:27:45,268] [INFO] [config.py:904:print]   eigenvalue_stability ......... 1e-06
[2021-09-25 04:27:45,268] [INFO] [config.py:904:print]   eigenvalue_tol ............... 0.01
[2021-09-25 04:27:45,268] [INFO] [config.py:904:print]   eigenvalue_verbose ........... False
[2021-09-25 04:27:45,268] [INFO] [config.py:904:print]   elasticity_enabled ........... False
[2021-09-25 04:27:45,268] [INFO] [config.py:904:print]   flops_profiler_config ........ {
    "enabled": false, 
    "profile_step": 1, 
    "module_depth": -1, 
    "top_modules": 1, 
    "detailed": true, 
    "output_file": null
}
[2021-09-25 04:27:45,268] [INFO] [config.py:904:print]   fp16_enabled ................. True
[2021-09-25 04:27:45,268] [INFO] [config.py:904:print]   fp16_mixed_quantize .......... False
[2021-09-25 04:27:45,268] [INFO] [config.py:904:print]   global_rank .................. 0
[2021-09-25 04:27:45,268] [INFO] [config.py:904:print]   gradient_accumulation_steps .. 256
[2021-09-25 04:27:45,269] [INFO] [config.py:904:print]   gradient_clipping ............ 1.0
[2021-09-25 04:27:45,269] [INFO] [config.py:904:print]   gradient_predivide_factor .... 1.0
[2021-09-25 04:27:45,269] [INFO] [config.py:904:print]   initial_dynamic_scale ........ 4096
[2021-09-25 04:27:45,269] [INFO] [config.py:904:print]   loss_scale ................... 0
[2021-09-25 04:27:45,269] [INFO] [config.py:904:print]   memory_breakdown ............. False
[2021-09-25 04:27:45,269] [INFO] [config.py:904:print]   optimizer_legacy_fusion ...... False
[2021-09-25 04:27:45,269] [INFO] [config.py:904:print]   optimizer_name ............... None
[2021-09-25 04:27:45,269] [INFO] [config.py:904:print]   optimizer_params ............. None
[2021-09-25 04:27:45,269] [INFO] [config.py:904:print]   pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0}
[2021-09-25 04:27:45,269] [INFO] [config.py:904:print]   pld_enabled .................. False
[2021-09-25 04:27:45,269] [INFO] [config.py:904:print]   pld_params ................... False
[2021-09-25 04:27:45,269] [INFO] [config.py:904:print]   prescale_gradients ........... False
[2021-09-25 04:27:45,269] [INFO] [config.py:904:print]   quantize_change_rate ......... 0.001
[2021-09-25 04:27:45,269] [INFO] [config.py:904:print]   quantize_groups .............. 1
[2021-09-25 04:27:45,269] [INFO] [config.py:904:print]   quantize_offset .............. 1000
[2021-09-25 04:27:45,269] [INFO] [config.py:904:print]   quantize_period .............. 1000
[2021-09-25 04:27:45,269] [INFO] [config.py:904:print]   quantize_rounding ............ 0
[2021-09-25 04:27:45,269] [INFO] [config.py:904:print]   quantize_start_bits .......... 16
[2021-09-25 04:27:45,269] [INFO] [config.py:904:print]   quantize_target_bits ......... 8
[2021-09-25 04:27:45,269] [INFO] [config.py:904:print]   quantize_training_enabled .... False
[2021-09-25 04:27:45,269] [INFO] [config.py:904:print]   quantize_type ................ 0
[2021-09-25 04:27:45,269] [INFO] [config.py:904:print]   quantize_verbose ............. False
[2021-09-25 04:27:45,269] [INFO] [config.py:904:print]   scheduler_name ............... None
[2021-09-25 04:27:45,269] [INFO] [config.py:904:print]   scheduler_params ............. None
[2021-09-25 04:27:45,269] [INFO] [config.py:904:print]   sparse_attention ............. None
[2021-09-25 04:27:45,269] [INFO] [config.py:904:print]   sparse_gradients_enabled ..... False
[2021-09-25 04:27:45,269] [INFO] [config.py:904:print]   steps_per_print .............. 2000
[2021-09-25 04:27:45,269] [INFO] [config.py:904:print]   tensorboard_enabled .......... False
[2021-09-25 04:27:45,269] [INFO] [config.py:904:print]   tensorboard_job_name ......... DeepSpeedJobName
[2021-09-25 04:27:45,269] [INFO] [config.py:904:print]   tensorboard_output_path ...... 
[2021-09-25 04:27:45,269] [INFO] [config.py:904:print]   train_batch_size ............. 2048
[2021-09-25 04:27:45,269] [INFO] [config.py:904:print]   train_micro_batch_size_per_gpu  1
[2021-09-25 04:27:45,269] [INFO] [config.py:904:print]   use_quantizer_kernel ......... False
[2021-09-25 04:27:45,269] [INFO] [config.py:904:print]   wall_clock_breakdown ......... False
[2021-09-25 04:27:45,269] [INFO] [config.py:904:print]   world_size ................... 8
[2021-09-25 04:27:45,269] [INFO] [config.py:904:print]   zero_allow_untested_optimizer  False
[2021-09-25 04:27:45,270] [INFO] [config.py:904:print]   zero_config .................. {
    "stage": 1, 
    "contiguous_gradients": false, 
    "reduce_scatter": true, 
    "reduce_bucket_size": 5.000000e+08, 
    "allgather_partitions": true, 
    "allgather_bucket_size": 5.000000e+08, 
    "overlap_comm": false, 
    "load_from_fp32_weights": true, 
    "elastic_checkpoint": true, 
    "offload_param": null, 
    "offload_optimizer": null, 
    "sub_group_size": 1.000000e+09, 
    "prefetch_bucket_size": 5.000000e+07, 
    "param_persistence_threshold": 1.000000e+05, 
    "max_live_parameters": 1.000000e+09, 
    "max_reuse_distance": 1.000000e+09, 
    "gather_fp16_weights_on_model_save": false, 
    "ignore_unused_parameters": true, 
    "round_robin_gradients": false, 
    "legacy_stage1": false
}
[2021-09-25 04:27:45,270] [INFO] [config.py:904:print]   zero_enabled ................. True
[2021-09-25 04:27:45,270] [INFO] [config.py:904:print]   zero_optimization_stage ...... 1
[2021-09-25 04:27:45,270] [INFO] [config.py:906:print]   json = {
    "train_micro_batch_size_per_gpu": 1, 
    "train_batch_size": 2.048000e+03, 
    "gradient_clipping": 1.0, 
    "zero_optimization": {
        "stage": 1
    }, 
    "fp16": {
        "enabled": true, 
        "loss_scale": 0, 
        "loss_scale_window": 500, 
        "hysteresis": 2, 
        "min_loss_scale": 1, 
        "initial_scale_power": 12
    }, 
    "steps_per_print": 2.000000e+03, 
    "wall_clock_breakdown": false
}
[2021-09-25 04:27:45,270] [INFO] [engine.py:76:__init__] CONFIG: micro_batches=256 micro_batch_size=1
[2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=0 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=2 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=3 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=1 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=129 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=131 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=130 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=128 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=97 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=98 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=96 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=99 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=195 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=193 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=194 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=192 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=65 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=64 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=67 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=66 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=163 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=160 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=162 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=161 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=225 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=226 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=227 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=224 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=32 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=33 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=34 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=35 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
 > using checkpoint value 6e-05 for learning rate
 > using checkpoint value 6e-06 for minimum learning rate
 > using checkpoint value 216320 for warmup iterations
 > using checkpoint value 126953125 for total number of iterations
 > using checkpoint value cosine for decay style
successfully loaded 8 ZeRO state_dicts for rank 48
successfully loaded 8 ZeRO state_dicts for rank 156
successfully loaded 8 ZeRO state_dicts for rank 223
successfully loaded 8 ZeRO state_dicts for rank 220
successfully loaded 8 ZeRO state_dicts for rank 49
successfully loaded 8 ZeRO state_dicts for rank 142
successfully loaded 8 ZeRO state_dicts for rank 221
successfully loaded 8 ZeRO state_dicts for rank 75
successfully loaded 8 ZeRO state_dicts for rank 50
successfully loaded 8 ZeRO state_dicts for rank 158
successfully loaded 8 ZeRO state_dicts for rank 33
successfully loaded 8 ZeRO state_dicts for rank 184
successfully loaded 8 ZeRO state_dicts for rank 216
successfully loaded 8 ZeRO state_dicts for rank 215
successfully loaded 8 ZeRO state_dicts for rank 157
successfully loaded 8 ZeRO state_dicts for rank 141
successfully loaded 8 ZeRO state_dicts for rank 222
successfully loaded 8 ZeRO state_dicts for rank 108
successfully loaded 8 ZeRO state_dicts for rank 58
successfully loaded 8 ZeRO state_dicts for rank 104
successfully loaded 8 ZeRO state_dicts for rank 100
successfully loaded 8 ZeRO state_dicts for rank 213
successfully loaded 8 ZeRO state_dicts for rank 91
successfully loaded 8 ZeRO state_dicts for rank 89
successfully loaded 8 ZeRO state_dicts for rank 40
successfully loaded 8 ZeRO state_dicts for rank 35
successfully loaded 8 ZeRO state_dicts for rank 144
successfully loaded 8 ZeRO state_dicts for rank 140
successfully loaded 8 ZeRO state_dicts for rank 60
successfully loaded 8 ZeRO state_dicts for rank 73
successfully loaded 8 ZeRO state_dicts for rank 51
successfully loaded 8 ZeRO state_dicts for rank 72
successfully loaded 8 ZeRO state_dicts for rank 159
successfully loaded 8 ZeRO state_dicts for rank 212
successfully loaded 8 ZeRO state_dicts for rank 146
successfully loaded 8 ZeRO state_dicts for rank 214
successfully loaded 8 ZeRO state_dicts for rank 143
successfully loaded 8 ZeRO state_dicts for rank 164
successfully loaded 8 ZeRO state_dicts for rank 34
successfully loaded 8 ZeRO state_dicts for rank 52
successfully loaded 8 ZeRO state_dicts for rank 131
successfully loaded 8 ZeRO state_dicts for rank 132
successfully loaded 8 ZeRO state_dicts for rank 45
successfully loaded 8 ZeRO state_dicts for rank 211
successfully loaded 8 ZeRO state_dicts for rank 61
successfully loaded 8 ZeRO state_dicts for rank 154
successfully loaded 8 ZeRO state_dicts for rank 88
successfully loaded 8 ZeRO state_dicts for rank 87
successfully loaded 8 ZeRO state_dicts for rank 90
successfully loaded 8 ZeRO state_dicts for rank 84
successfully loaded 8 ZeRO state_dicts for rank 53
successfully loaded 8 ZeRO state_dicts for rank 116
successfully loaded 8 ZeRO state_dicts for rank 59
successfully loaded 8 ZeRO state_dicts for rank 128
successfully loaded 8 ZeRO state_dicts for rank 165
successfully loaded 8 ZeRO state_dicts for rank 192
successfully loaded 8 ZeRO state_dicts for rank 210
successfully loaded 8 ZeRO state_dicts for rank 129
successfully loaded 8 ZeRO state_dicts for rank 93
successfully loaded 8 ZeRO state_dicts for rank 62
successfully loaded 8 ZeRO state_dicts for rank 81
successfully loaded 8 ZeRO state_dicts for rank 203
successfully loaded 8 ZeRO state_dicts for rank 76
successfully loaded 8 ZeRO state_dicts for rank 127
successfully loaded 8 ZeRO state_dicts for rank 38
successfully loaded 8 ZeRO state_dicts for rank 160
successfully loaded 8 ZeRO state_dicts for rank 113
successfully loaded 8 ZeRO state_dicts for rank 63
successfully loaded 8 ZeRO state_dicts for rank 145
successfully loaded 8 ZeRO state_dicts for rank 36
successfully loaded 8 ZeRO state_dicts for rank 57
successfully loaded 8 ZeRO state_dicts for rank 99
successfully loaded 8 ZeRO state_dicts for rank 67
successfully loaded 8 ZeRO state_dicts for rank 32
successfully loaded 8 ZeRO state_dicts for rank 47
successfully loaded 8 ZeRO state_dicts for rank 147
successfully loaded 8 ZeRO state_dicts for rank 112
successfully loaded 8 ZeRO state_dicts for rank 150
successfully loaded 8 ZeRO state_dicts for rank 178
successfully loaded 8 ZeRO state_dicts for rank 166
successfully loaded 8 ZeRO state_dicts for rank 161
successfully loaded 8 ZeRO state_dicts for rank 219
successfully loaded 8 ZeRO state_dicts for rank 120
successfully loaded 8 ZeRO state_dicts for rank 56
loading 8 zero partition checkpoints for rank 223
successfully loaded 8 ZeRO state_dicts for rank 54
successfully loaded 8 ZeRO state_dicts for rank 130
successfully loaded 8 ZeRO state_dicts for rank 79
successfully loaded 8 ZeRO state_dicts for rank 218
successfully loaded 8 ZeRO state_dicts for rank 65
successfully loaded 8 ZeRO state_dicts for rank 115
successfully loaded 8 ZeRO state_dicts for rank 85
loading 8 zero partition checkpoints for rank 156
successfully loaded 8 ZeRO state_dicts for rank 109
successfully loaded 8 ZeRO state_dicts for rank 209
successfully loaded 8 ZeRO state_dicts for rank 152
successfully loaded 8 ZeRO state_dicts for rank 83
successfully loaded 8 ZeRO state_dicts for rank 103
successfully loaded 8 ZeRO state_dicts for rank 66
successfully loaded 8 ZeRO state_dicts for rank 44
successfully loaded 8 ZeRO state_dicts for rank 74
successfully loaded 8 ZeRO state_dicts for rank 96
successfully loaded 8 ZeRO state_dicts for rank 86
successfully loaded 8 ZeRO state_dicts for rank 151
successfully loaded 8 ZeRO state_dicts for rank 171
successfully loaded 8 ZeRO state_dicts for rank 135
successfully loaded 8 ZeRO state_dicts for rank 14
successfully loaded 8 ZeRO state_dicts for rank 64
successfully loaded 8 ZeRO state_dicts for rank 196
successfully loaded 8 ZeRO state_dicts for rank 123
successfully loaded 8 ZeRO state_dicts for rank 136
successfully loaded 8 ZeRO state_dicts for rank 181
successfully loaded 8 ZeRO state_dicts for rank 55
successfully loaded 8 ZeRO state_dicts for rank 228
loading 8 zero partition checkpoints for rank 48
successfully loaded 8 ZeRO state_dicts for rank 124
successfully loaded 8 ZeRO state_dicts for rank 170
successfully loaded 8 ZeRO state_dicts for rank 208
successfully loaded 8 ZeRO state_dicts for rank 105
successfully loaded 8 ZeRO state_dicts for rank 95
successfully loaded 8 ZeRO state_dicts for rank 134
successfully loaded 8 ZeRO state_dicts for rank 153
successfully loaded 8 ZeRO state_dicts for rank 204
successfully loaded 8 ZeRO state_dicts for rank 125
successfully loaded 8 ZeRO state_dicts for rank 111
successfully loaded 8 ZeRO state_dicts for rank 133
successfully loaded 8 ZeRO state_dicts for rank 149
successfully loaded 8 ZeRO state_dicts for rank 194
successfully loaded 8 ZeRO state_dicts for rank 148
successfully loaded 8 ZeRO state_dicts for rank 217
successfully loaded 8 ZeRO state_dicts for rank 206
successfully loaded 8 ZeRO state_dicts for rank 114
successfully loaded 8 ZeRO state_dicts for rank 200
loading 8 zero partition checkpoints for rank 220
successfully loaded 8 ZeRO state_dicts for rank 202
successfully loaded 8 ZeRO state_dicts for rank 138
successfully loaded 8 ZeRO state_dicts for rank 139
successfully loaded 8 ZeRO state_dicts for rank 37
successfully loaded 8 ZeRO state_dicts for rank 176
successfully loaded 8 ZeRO state_dicts for rank 168
successfully loaded 8 ZeRO state_dicts for rank 98
successfully loaded 8 ZeRO state_dicts for rank 101
successfully loaded 8 ZeRO state_dicts for rank 39
successfully loaded 8 ZeRO state_dicts for rank 107
successfully loaded 8 ZeRO state_dicts for rank 42
successfully loaded 8 ZeRO state_dicts for rank 8
successfully loaded 8 ZeRO state_dicts for rank 186
successfully loaded 8 ZeRO state_dicts for rank 94
loading 8 zero partition checkpoints for rank 142
successfully loaded 8 ZeRO state_dicts for rank 77
successfully loaded 8 ZeRO state_dicts for rank 137
successfully loaded 8 ZeRO state_dicts for rank 207
successfully loaded 8 ZeRO state_dicts for rank 172
successfully loaded 8 ZeRO state_dicts for rank 199
successfully loaded 8 ZeRO state_dicts for rank 43
successfully loaded 8 ZeRO state_dicts for rank 69
successfully loaded 8 ZeRO state_dicts for rank 205
successfully loaded 8 ZeRO state_dicts for rank 167
successfully loaded 8 ZeRO state_dicts for rank 41
successfully loaded 8 ZeRO state_dicts for rank 80
successfully loaded 8 ZeRO state_dicts for rank 119
successfully loaded 8 ZeRO state_dicts for rank 106
successfully loaded 8 ZeRO state_dicts for rank 187
successfully loaded 8 ZeRO state_dicts for rank 197
successfully loaded 8 ZeRO state_dicts for rank 92
successfully loaded 8 ZeRO state_dicts for rank 236
successfully loaded 8 ZeRO state_dicts for rank 97
successfully loaded 8 ZeRO state_dicts for rank 155
successfully loaded 8 ZeRO state_dicts for rank 82
successfully loaded 8 ZeRO state_dicts for rank 185
successfully loaded 8 ZeRO state_dicts for rank 78
successfully loaded 8 ZeRO state_dicts for rank 10
successfully loaded 8 ZeRO state_dicts for rank 71
successfully loaded 8 ZeRO state_dicts for rank 68
successfully loaded 8 ZeRO state_dicts for rank 195
successfully loaded 8 ZeRO state_dicts for rank 102
successfully loaded 8 ZeRO state_dicts for rank 70
successfully loaded 8 ZeRO state_dicts for rank 26
successfully loaded 8 ZeRO state_dicts for rank 180
successfully loaded 8 ZeRO state_dicts for rank 117
loading 8 zero partition checkpoints for rank 75
successfully loaded 8 ZeRO state_dicts for rank 121
successfully loaded 8 ZeRO state_dicts for rank 174
successfully loaded 8 ZeRO state_dicts for rank 24
loading 8 zero partition checkpoints for rank 50
successfully loaded 8 ZeRO state_dicts for rank 179
successfully loaded 8 ZeRO state_dicts for rank 248
successfully loaded 8 ZeRO state_dicts for rank 46
successfully loaded 8 ZeRO state_dicts for rank 12
successfully loaded 8 ZeRO state_dicts for rank 126
successfully loaded 8 ZeRO state_dicts for rank 169
loading 8 zero partition checkpoints for rank 216
loading 8 zero partition checkpoints for rank 215
successfully loaded 8 ZeRO state_dicts for rank 11
successfully loaded 8 ZeRO state_dicts for rank 183
successfully loaded 8 ZeRO state_dicts for rank 162
loading 8 zero partition checkpoints for rank 222
loading 8 zero partition checkpoints for rank 108
successfully loaded 8 ZeRO state_dicts for rank 182
successfully loaded 8 ZeRO state_dicts for rank 27
successfully loaded 8 ZeRO state_dicts for rank 252
successfully loaded 8 ZeRO state_dicts for rank 224
successfully loaded 8 ZeRO state_dicts for rank 201
successfully loaded 8 ZeRO state_dicts for rank 240
successfully loaded 8 ZeRO state_dicts for rank 190
loading 8 zero partition checkpoints for rank 141
loading 8 zero partition checkpoints for rank 221
successfully loaded 8 ZeRO state_dicts for rank 193
successfully loaded 8 ZeRO state_dicts for rank 231
successfully loaded 8 ZeRO state_dicts for rank 175
successfully loaded 8 ZeRO state_dicts for rank 122
successfully loaded 8 ZeRO state_dicts for rank 13
loading 8 zero partition checkpoints for rank 157
successfully loaded 8 ZeRO state_dicts for rank 110
successfully loaded 8 ZeRO state_dicts for rank 233
successfully loaded 8 ZeRO state_dicts for rank 118
loading 8 zero partition checkpoints for rank 184
successfully loaded 8 ZeRO state_dicts for rank 198
successfully loaded 8 ZeRO state_dicts for rank 30
successfully loaded 8 ZeRO state_dicts for rank 163
successfully loaded 8 ZeRO state_dicts for rank 244
successfully loaded 8 ZeRO state_dicts for rank 16
successfully loaded 8 ZeRO state_dicts for rank 18
successfully loaded 8 ZeRO state_dicts for rank 250
successfully loaded 8 ZeRO state_dicts for rank 2
successfully loaded 8 ZeRO state_dicts for rank 25
successfully loaded 8 ZeRO state_dicts for rank 230
successfully loaded 8 ZeRO state_dicts for rank 235
successfully loaded 8 ZeRO state_dicts for rank 31
successfully loaded 8 ZeRO state_dicts for rank 177
successfully loaded 8 ZeRO state_dicts for rank 28
successfully loaded 8 ZeRO state_dicts for rank 238
loading 8 zero partition checkpoints for rank 60
loading 8 zero partition checkpoints for rank 144
loading 8 zero partition checkpoints for rank 104
loading 8 zero partition checkpoints for rank 213
loading 8 zero partition checkpoints for rank 89
loading 8 zero partition checkpoints for rank 40
successfully loaded 8 ZeRO state_dicts for rank 239
loading 8 zero partition checkpoints for rank 140
successfully loaded 8 ZeRO state_dicts for rank 191
loading 8 zero partition checkpoints for rank 91
loading 8 zero partition checkpoints for rank 100
successfully loaded 8 ZeRO state_dicts for rank 173
successfully loaded 8 ZeRO state_dicts for rank 232
successfully loaded 8 ZeRO state_dicts for rank 22
loading 8 zero partition checkpoints for rank 52
successfully loaded 8 ZeRO state_dicts for rank 188
successfully loaded 8 ZeRO state_dicts for rank 249
successfully loaded 8 ZeRO state_dicts for rank 189
successfully loaded 8 ZeRO state_dicts for rank 237
successfully loaded 8 ZeRO state_dicts for rank 253
successfully loaded 8 ZeRO state_dicts for rank 229
successfully loaded 8 ZeRO state_dicts for rank 29
successfully loaded 8 ZeRO state_dicts for rank 226
successfully loaded 8 ZeRO state_dicts for rank 251
loading 8 zero partition checkpoints for rank 212
successfully loaded 8 ZeRO state_dicts for rank 17
successfully loaded 8 ZeRO state_dicts for rank 241
loading 8 zero partition checkpoints for rank 214
successfully loaded 8 ZeRO state_dicts for rank 9
successfully loaded 8 ZeRO state_dicts for rank 255
successfully loaded 8 ZeRO state_dicts for rank 15
successfully loaded 8 ZeRO state_dicts for rank 245
loading 8 zero partition checkpoints for rank 211
successfully loaded 8 ZeRO state_dicts for rank 246
loading 8 zero partition checkpoints for rank 87
successfully loaded 8 ZeRO state_dicts for rank 242
successfully loaded 8 ZeRO state_dicts for rank 227
successfully loaded 8 ZeRO state_dicts for rank 243
successfully loaded 8 ZeRO state_dicts for rank 247
loading 8 zero partition checkpoints for rank 143
successfully loaded 8 ZeRO state_dicts for rank 19
loading 8 zero partition checkpoints for rank 116
loading 8 zero partition checkpoints for rank 132
loading 8 zero partition checkpoints for rank 88
successfully loaded 8 ZeRO state_dicts for rank 20
loading 8 zero partition checkpoints for rank 49
loading 8 zero partition checkpoints for rank 128
loading 8 zero partition checkpoints for rank 154
loading 8 zero partition checkpoints for rank 165
loading 8 zero partition checkpoints for rank 62
successfully loaded 8 ZeRO state_dicts for rank 254
loading 8 zero partition checkpoints for rank 93
successfully loaded 8 ZeRO state_dicts for rank 225
loading 8 zero partition checkpoints for rank 81
loading 8 zero partition checkpoints for rank 127
loading 8 zero partition checkpoints for rank 76
loading 8 zero partition checkpoints for rank 99
loading 8 zero partition checkpoints for rank 57
successfully loaded 8 ZeRO state_dicts for rank 0
loading 8 zero partition checkpoints for rank 90
loading 8 zero partition checkpoints for rank 73
successfully loaded 8 ZeRO state_dicts for rank 1
successfully loaded 8 ZeRO state_dicts for rank 234
loading 8 zero partition checkpoints for rank 166
successfully loaded 8 ZeRO state_dicts for rank 3
loading 8 zero partition checkpoints for rank 84
loading 8 zero partition checkpoints for rank 113
loading 8 zero partition checkpoints for rank 147
loading 8 zero partition checkpoints for rank 219
loading 8 zero partition checkpoints for rank 51
loading 8 zero partition checkpoints for rank 72
loading 8 zero partition checkpoints for rank 58
loading 8 zero partition checkpoints for rank 160
loading 8 zero partition checkpoints for rank 56
loading 8 zero partition checkpoints for rank 158
loading 8 zero partition checkpoints for rank 65
loading 8 zero partition checkpoints for rank 130
loading 8 zero partition checkpoints for rank 115
loading 8 zero partition checkpoints for rank 67
successfully loaded 8 ZeRO state_dicts for rank 21
loading 8 zero partition checkpoints for rank 209
loading 8 zero partition checkpoints for rank 109
loading 8 zero partition checkpoints for rank 44
loading 8 zero partition checkpoints for rank 74
loading 8 zero partition checkpoints for rank 86
loading 8 zero partition checkpoints for rank 45
loading 8 zero partition checkpoints for rank 83
loading 8 zero partition checkpoints for rank 171
loading 8 zero partition checkpoints for rank 136
successfully loaded 8 ZeRO state_dicts for rank 23
loading 8 zero partition checkpoints for rank 218
loading 8 zero partition checkpoints for rank 159
loading 8 zero partition checkpoints for rank 196
loading 8 zero partition checkpoints for rank 66
loading 8 zero partition checkpoints for rank 125
loading 8 zero partition checkpoints for rank 111
loading 8 zero partition checkpoints for rank 181
loading 8 zero partition checkpoints for rank 151
loading 8 zero partition checkpoints for rank 64
loading 8 zero partition checkpoints for rank 134
loading 8 zero partition checkpoints for rank 85
loading 8 zero partition checkpoints for rank 206
loading 8 zero partition checkpoints for rank 120
loading 8 zero partition checkpoints for rank 37
loading 8 zero partition checkpoints for rank 146
loading 8 zero partition checkpoints for rank 95
loading 8 zero partition checkpoints for rank 194
loading 8 zero partition checkpoints for rank 202
loading 8 zero partition checkpoints for rank 178
loading 8 zero partition checkpoints for rank 138
loading 8 zero partition checkpoints for rank 170
loading 8 zero partition checkpoints for rank 55
loading 8 zero partition checkpoints for rank 61
loading 8 zero partition checkpoints for rank 101
loading 8 zero partition checkpoints for rank 124
loading 8 zero partition checkpoints for rank 135
loading 8 zero partition checkpoints for rank 148
loading 8 zero partition checkpoints for rank 139
loading 8 zero partition checkpoints for rank 14
loading 8 zero partition checkpoints for rank 77
loading 8 zero partition checkpoints for rank 39
loading 8 zero partition checkpoints for rank 152
loading 8 zero partition checkpoints for rank 59
loading 8 zero partition checkpoints for rank 80
loading 8 zero partition checkpoints for rank 106
loading 8 zero partition checkpoints for rank 69
loading 8 zero partition checkpoints for rank 79
loading 8 zero partition checkpoints for rank 47
loading 8 zero partition checkpoints for rank 203
loading 8 zero partition checkpoints for rank 94
loading 8 zero partition checkpoints for rank 186
loading 8 zero partition checkpoints for rank 217
loading 8 zero partition checkpoints for rank 97
loading 8 zero partition checkpoints for rank 92
loading 8 zero partition checkpoints for rank 71
loading 8 zero partition checkpoints for rank 164
loading 8 zero partition checkpoints for rank 41
loading 8 zero partition checkpoints for rank 103
loading 8 zero partition checkpoints for rank 131
loading 8 zero partition checkpoints for rank 197
loading 8 zero partition checkpoints for rank 112
loading 8 zero partition checkpoints for rank 145
loading 8 zero partition checkpoints for rank 180
loading 8 zero partition checkpoints for rank 70
loading 8 zero partition checkpoints for rank 63
loading 8 zero partition checkpoints for rank 123
loading 8 zero partition checkpoints for rank 137
loading 8 zero partition checkpoints for rank 82
loading 8 zero partition checkpoints for rank 150
loading 8 zero partition checkpoints for rank 68
loading 8 zero partition checkpoints for rank 228
loading 8 zero partition checkpoints for rank 187
loading 8 zero partition checkpoints for rank 205
loading 8 zero partition checkpoints for rank 8
loading 8 zero partition checkpoints for rank 46
loading 8 zero partition checkpoints for rank 117
loading 8 zero partition checkpoints for rank 185
loading 8 zero partition checkpoints for rank 183
loading 8 zero partition checkpoints for rank 168
loading 8 zero partition checkpoints for rank 133
loading 8 zero partition checkpoints for rank 155
loading 8 zero partition checkpoints for rank 176
loading 8 zero partition checkpoints for rank 119
loading 8 zero partition checkpoints for rank 153
loading 8 zero partition checkpoints for rank 121
loading 8 zero partition checkpoints for rank 42
loading 8 zero partition checkpoints for rank 102
loading 8 zero partition checkpoints for rank 96
loading 8 zero partition checkpoints for rank 236
loading 8 zero partition checkpoints for rank 201
loading 8 zero partition checkpoints for rank 179
loading 8 zero partition checkpoints for rank 162
loading 8 zero partition checkpoints for rank 182
loading 8 zero partition checkpoints for rank 43
loading 8 zero partition checkpoints for rank 107
loading 8 zero partition checkpoints for rank 129
loading 8 zero partition checkpoints for rank 110
loading 8 zero partition checkpoints for rank 38
loading 8 zero partition checkpoints for rank 126
loading 8 zero partition checkpoints for rank 105
loading 8 zero partition checkpoints for rank 193
loading 8 zero partition checkpoints for rank 118
loading 8 zero partition checkpoints for rank 248
loading 8 zero partition checkpoints for rank 114
loading 8 zero partition checkpoints for rank 122
loading 8 zero partition checkpoints for rank 200
loading 8 zero partition checkpoints for rank 33
loading 8 zero partition checkpoints for rank 177
loading 8 zero partition checkpoints for rank 149
loading 8 zero partition checkpoints for rank 36
loading 8 zero partition checkpoints for rank 233
loading 8 zero partition checkpoints for rank 53
loading 8 zero partition checkpoints for rank 161
loading 8 zero partition checkpoints for rank 12
loading 8 zero partition checkpoints for rank 244
loading 8 zero partition checkpoints for rank 78
loading 8 zero partition checkpoints for rank 30
loading 8 zero partition checkpoints for rank 98
loading 8 zero partition checkpoints for rank 204
loading 8 zero partition checkpoints for rank 16
loading 8 zero partition checkpoints for rank 169
loading 8 zero partition checkpoints for rank 28
loading 8 zero partition checkpoints for rank 199
loading 8 zero partition checkpoints for rank 230
loading 8 zero partition checkpoints for rank 224
loading 8 zero partition checkpoints for rank 35
loading 8 zero partition checkpoints for rank 240
loading 8 zero partition checkpoints for rank 167
loading 8 zero partition checkpoints for rank 54
loading 8 zero partition checkpoints for rank 210
loading 8 zero partition checkpoints for rank 27
loading 8 zero partition checkpoints for rank 10
loading 8 zero partition checkpoints for rank 190
loading 8 zero partition checkpoints for rank 192
loading 8 zero partition checkpoints for rank 34
loading 8 zero partition checkpoints for rank 252
loading 8 zero partition checkpoints for rank 163
loading 8 zero partition checkpoints for rank 13
loading 8 zero partition checkpoints for rank 207
loading 8 zero partition checkpoints for rank 191
loading 8 zero partition checkpoints for rank 32
loading 8 zero partition checkpoints for rank 231
loading 8 zero partition checkpoints for rank 26
loading 8 zero partition checkpoints for rank 9
loading 8 zero partition checkpoints for rank 255
loading 8 zero partition checkpoints for rank 11
loading 8 zero partition checkpoints for rank 175
loading 8 zero partition checkpoints for rank 241
loading 8 zero partition checkpoints for rank 25
loading 8 zero partition checkpoints for rank 189
loading 8 zero partition checkpoints for rank 17
loading 8 zero partition checkpoints for rank 24
loading 8 zero partition checkpoints for rank 245
loading 8 zero partition checkpoints for rank 208
loading 8 zero partition checkpoints for rank 198
loading 8 zero partition checkpoints for rank 254
loading 8 zero partition checkpoints for rank 237
loading 8 zero partition checkpoints for rank 188
loading 8 zero partition checkpoints for rank 251
loading 8 zero partition checkpoints for rank 225
loading 8 zero partition checkpoints for rank 0
 checkpoint version 3.0
loading 8 zero partition checkpoints for rank 253
loading 8 zero partition checkpoints for rank 229
loading 8 zero partition checkpoints for rank 250
loading 8 zero partition checkpoints for rank 195
loading 8 zero partition checkpoints for rank 173
loading 8 zero partition checkpoints for rank 1
loading 8 zero partition checkpoints for rank 234
loading 8 zero partition checkpoints for rank 15
loading 8 zero partition checkpoints for rank 239
loading 8 zero partition checkpoints for rank 247
loading 8 zero partition checkpoints for rank 246
loading 8 zero partition checkpoints for rank 172
loading 8 zero partition checkpoints for rank 249
loading 8 zero partition checkpoints for rank 238
loading 8 zero partition checkpoints for rank 31
loading 8 zero partition checkpoints for rank 243
loading 8 zero partition checkpoints for rank 242
loading 8 zero partition checkpoints for rank 174
loading 8 zero partition checkpoints for rank 226
loading 8 zero partition checkpoints for rank 29
loading 8 zero partition checkpoints for rank 18
loading 8 zero partition checkpoints for rank 227
loading 8 zero partition checkpoints for rank 19
loading 8 zero partition checkpoints for rank 2
loading 8 zero partition checkpoints for rank 235
loading 8 zero partition checkpoints for rank 232
loading 8 zero partition checkpoints for rank 3
loading 8 zero partition checkpoints for rank 22
loading 8 zero partition checkpoints for rank 20
loading 8 zero partition checkpoints for rank 21
loading 8 zero partition checkpoints for rank 23
successfully loaded 8 ZeRO state_dicts for rank 6
loading 8 zero partition checkpoints for rank 6
successfully loaded 8 ZeRO state_dicts for rank 7
successfully loaded 8 ZeRO state_dicts for rank 4
successfully loaded 8 ZeRO state_dicts for rank 5
WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 04:30:09 CEST)" was missed by 0:00:03.764782
loading 8 zero partition checkpoints for rank 4
loading 8 zero partition checkpoints for rank 7
loading 8 zero partition checkpoints for rank 5
  successfully loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints at iteration 6210
time (ms) | load-checkpoint: 91691.46
[after model, optimizer, and learning rate scheduler are built] datetime: 2021-09-25 04:29:17 
> building train, validation, and test datasets ...
 > datasets target sizes (minimum size):
    train:      300000000
    validation: 1638400
    test:       10240
> building train, validation, and test datasets for GPT ...
 > building dataset index ...
    reading sizes...
    reading pointers...
    reading document index...
    creating numpy buffer of mmap...
    creating memory view of numpy buffer...
 > finished creating indexed dataset in 0.138486 seconds
    number of documents: 304230423
 > dataset split:
    train:
     document indices in [0, 288714672) total of 288714672 documents
    validation:
     document indices in [288714672, 303926193) total of 15211521 documents
    test:
     document indices in [303926193, 304230423) total of 304230 documents
 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_300000000ns_2048sl_42s_doc_idx.npy
 > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_300000000ns_2048sl_42s_sample_idx.npy
 > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_300000000ns_2048sl_42s_shuffle_idx.npy
    loaded indexed file in 0.350 seconds
    total number of samples: 394611670
    total number of epochs: 3
 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_1638400ns_2048sl_42s_doc_idx.npy
 > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_1638400ns_2048sl_42s_sample_idx.npy
 > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_1638400ns_2048sl_42s_shuffle_idx.npy
    loaded indexed file in 0.276 seconds
    total number of samples: 6927161
    total number of epochs: 1
 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_42s_doc_idx.npy
 > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_42s_sample_idx.npy
 > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_42s_shuffle_idx.npy
    loaded indexed file in 0.080 seconds
    total number of samples: 137384
    total number of epochs: 1
> finished creating GPT datasets ...
[after dataloaders are built] datetime: 2021-09-25 04:29:23 
done with setup ...
training ...
time (ms) | model-and-optimizer-setup: 99723.96 | train/valid/test-data-iterators-setup: 5641.98
[before the start of training step] datetime: 2021-09-25 04:29:23 
[2021-09-25 04:29:23,929] [INFO] [checkpointing.py:408:forward] Activation Checkpointing Information
[2021-09-25 04:29:23,930] [INFO] [checkpointing.py:409:forward] ----Partition Activations False, CPU CHECKPOINTING False
[2021-09-25 04:29:23,930] [INFO] [checkpointing.py:412:forward] ----contiguous Memory Checkpointing False with 32 total layers
[2021-09-25 04:29:23,930] [INFO] [checkpointing.py:415:forward] ----Synchronization False
[2021-09-25 04:29:23,930] [INFO] [checkpointing.py:416:forward] ----Profiling time in checkpointing False
[Rank 1] (after 6220 iterations) memory (MB) | allocated: 6689.83056640625 | max allocated: 13899.01416015625 | reserved: 23406.0 | max reserved: 23406.0
[Rank 225] (after 6220 iterations) memory (MB) | allocated: 7107.7119140625 | max allocated: 11885.68994140625 | reserved: 21700.0 | max reserved: 21700.0
[Rank 226] (after 6220 iterations) memory (MB) | allocated: 7107.7119140625 | max allocated: 11885.6884765625 | reserved: 22492.0 | max reserved: 22492.0
[Rank 2] (after 6220 iterations) memory (MB) | allocated: 6689.83056640625 | max allocated: 13899.01416015625 | reserved: 23406.0 | max reserved: 23406.0
[Rank 0] (after 6220 iterations) memory (MB) | allocated: 6689.83056640625 | max allocated: 13899.01416015625 | reserved: 23726.0 | max reserved: 23726.0
[Rank 224] (after 6220 iterations) memory (MB) | allocated: 7107.7119140625 | max allocated: 11885.68896484375 | reserved: 22492.0 | max reserved: 22492.0
[Rank 3] (after 6220 iterations) memory (MB) | allocated: 6689.83056640625 | max allocated: 13899.01416015625 | reserved: 23374.0 | max reserved: 23374.0
[Rank 227] (after 6220 iterations) memory (MB) | allocated: 7107.7119140625 | max allocated: 11885.68994140625 | reserved: 22492.0 | max reserved: 22492.0
 iteration     6220/  159576 | consumed samples:       194400 | elapsed time per iteration (ms): 18925.1 | learning rate: 5.378E-05 | global batch size:    80 | lm loss: 6.332304E+00 | loss scale: 4096.0 | grad norm: 207900.224 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[Rank 33] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 12082.4677734375 | reserved: 20130.0 | max reserved: 20130.0
[Rank 97] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11538.466796875 | reserved: 19402.0 | max reserved: 19402.0
[Rank 161] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 10994.4658203125 | reserved: 18826.0 | max reserved: 18826.0
[Rank 193] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 10722.46533203125 | reserved: 18826.0 | max reserved: 18826.0
[Rank 129] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11266.46630859375 | reserved: 19662.0 | max reserved: 19662.0
[Rank 65] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11810.46728515625 | reserved: 19946.0 | max reserved: 19946.0
[Rank 34] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 12082.4677734375 | reserved: 20170.0 | max reserved: 20170.0
[Rank 162] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 10994.4658203125 | reserved: 18826.0 | max reserved: 18826.0
[Rank 130] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11266.46630859375 | reserved: 19390.0 | max reserved: 19390.0
[Rank 98] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11538.466796875 | reserved: 19722.0 | max reserved: 19722.0
[Rank 194] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 10722.46533203125 | reserved: 18826.0 | max reserved: 18826.0
[Rank 66] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11810.46728515625 | reserved: 20094.0 | max reserved: 20094.0
[Rank 32] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 12082.4677734375 | reserved: 20456.0 | max reserved: 20456.0
[Rank 128] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11266.46630859375 | reserved: 19908.0 | max reserved: 19908.0
[Rank 96] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11538.466796875 | reserved: 19828.0 | max reserved: 19828.0
[Rank 64] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11810.46728515625 | reserved: 20328.0 | max reserved: 20328.0
[Rank 192] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 10722.46533203125 | reserved: 19396.0 | max reserved: 19396.0
[Rank 160] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 10994.4658203125 | reserved: 19572.0 | max reserved: 19572.0
[Rank 99] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11538.466796875 | reserved: 19662.0 | max reserved: 19662.0
[Rank 67] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11810.46728515625 | reserved: 19966.0 | max reserved: 19966.0
[Rank 131] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11266.46630859375 | reserved: 19578.0 | max reserved: 19578.0
[Rank 35] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 12082.4677734375 | reserved: 20078.0 | max reserved: 20078.0
[Rank 195] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 10722.46533203125 | reserved: 18842.0 | max reserved: 18842.0
[Rank 163] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 10994.4658203125 | reserved: 19066.0 | max reserved: 19066.0
 iteration     6230/  159576 | consumed samples:       195200 | elapsed time per iteration (ms): 17419.3 | learning rate: 5.400E-05 | global batch size:    80 | lm loss: 6.312761E+00 | loss scale: 4096.0 | grad norm: 102010.658 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6240/  159576 | consumed samples:       196000 | elapsed time per iteration (ms): 17458.3 | learning rate: 5.423E-05 | global batch size:    80 | lm loss: 6.325917E+00 | loss scale: 4096.0 | grad norm: 139671.438 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6250/  159576 | consumed samples:       196800 | elapsed time per iteration (ms): 17438.0 | learning rate: 5.445E-05 | global batch size:    80 | lm loss: 6.330989E+00 | loss scale: 4096.0 | grad norm: 117429.787 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6260/  159576 | consumed samples:       197600 | elapsed time per iteration (ms): 17495.4 | learning rate: 5.467E-05 | global batch size:    80 | lm loss: 6.330341E+00 | loss scale: 4096.0 | grad norm: 101380.992 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6270/  159576 | consumed samples:       198400 | elapsed time per iteration (ms): 17488.9 | learning rate: 5.489E-05 | global batch size:    80 | lm loss: 6.304220E+00 | loss scale: 4096.0 | grad norm: 137994.450 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6280/  159576 | consumed samples:       199200 | elapsed time per iteration (ms): 17456.9 | learning rate: 5.511E-05 | global batch size:    80 | lm loss: 6.302861E+00 | loss scale: 4096.0 | grad norm: 117645.788 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6290/  159576 | consumed samples:       200000 | elapsed time per iteration (ms): 16818.4 | learning rate: 5.531E-05 | global batch size:    80 | lm loss: 6.313686E+00 | loss scale: 4096.0 | grad norm: 87880.797 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6300/  159576 | consumed samples:       200800 | elapsed time per iteration (ms): 17519.8 | learning rate: 5.554E-05 | global batch size:    80 | lm loss: 6.270583E+00 | loss scale: 4096.0 | grad norm: 86063.377 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6310/  159576 | consumed samples:       201600 | elapsed time per iteration (ms): 17461.4 | learning rate: 5.576E-05 | global batch size:    80 | lm loss: 6.315401E+00 | loss scale: 4096.0 | grad norm: 120394.115 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6320/  159576 | consumed samples:       202400 | elapsed time per iteration (ms): 17455.8 | learning rate: 5.598E-05 | global batch size:    80 | lm loss: 6.326277E+00 | loss scale: 4096.0 | grad norm: 95784.457 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6330/  159576 | consumed samples:       203200 | elapsed time per iteration (ms): 17431.8 | learning rate: 5.620E-05 | global batch size:    80 | lm loss: 6.333566E+00 | loss scale: 4096.0 | grad norm: 119951.862 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6340/  159576 | consumed samples:       204000 | elapsed time per iteration (ms): 16668.3 | learning rate: 5.640E-05 | global batch size:    80 | lm loss: 6.321040E+00 | loss scale: 2048.0 | grad norm: 54351.143 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-25 05:08:29] PULSE: tr8-104B is waiting for the previous Job Array job to finish before scheduling a new one (1185639_[2-10%1] on 'gpu_p13' partition)
[2021-09-25 05:08:29] PULSE: tr8-104B is running for 41:28 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0)
 iteration     6350/  159576 | consumed samples:       204800 | elapsed time per iteration (ms): 17330.6 | learning rate: 5.662E-05 | global batch size:    80 | lm loss: 6.297153E+00 | loss scale: 2048.0 | grad norm: 61555.753 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6360/  159576 | consumed samples:       205600 | elapsed time per iteration (ms): 17390.9 | learning rate: 5.684E-05 | global batch size:    80 | lm loss: 6.296333E+00 | loss scale: 2048.0 | grad norm: 67211.747 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6370/  159576 | consumed samples:       206400 | elapsed time per iteration (ms): 17338.2 | learning rate: 5.707E-05 | global batch size:    80 | lm loss: 6.309451E+00 | loss scale: 2048.0 | grad norm: 66671.395 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6380/  159576 | consumed samples:       207200 | elapsed time per iteration (ms): 17380.7 | learning rate: 5.729E-05 | global batch size:    80 | lm loss: 6.301356E+00 | loss scale: 2048.0 | grad norm: 45299.990 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6390/  159576 | consumed samples:       208000 | elapsed time per iteration (ms): 17366.7 | learning rate: 5.751E-05 | global batch size:    80 | lm loss: 6.335297E+00 | loss scale: 2048.0 | grad norm: 59836.646 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6400/  159576 | consumed samples:       208800 | elapsed time per iteration (ms): 17383.7 | learning rate: 5.773E-05 | global batch size:    80 | lm loss: 6.303946E+00 | loss scale: 2048.0 | grad norm: 55594.564 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6410/  159576 | consumed samples:       209600 | elapsed time per iteration (ms): 17402.0 | learning rate: 5.795E-05 | global batch size:    80 | lm loss: 6.335719E+00 | loss scale: 2048.0 | grad norm: 63504.303 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6420/  159576 | consumed samples:       210400 | elapsed time per iteration (ms): 17371.7 | learning rate: 5.818E-05 | global batch size:    80 | lm loss: 6.278386E+00 | loss scale: 2048.0 | grad norm: 252963.122 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6430/  159576 | consumed samples:       211200 | elapsed time per iteration (ms): 17394.4 | learning rate: 5.840E-05 | global batch size:    80 | lm loss: 6.309026E+00 | loss scale: 2048.0 | grad norm: 70987.021 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6440/  159576 | consumed samples:       212000 | elapsed time per iteration (ms): 17385.8 | learning rate: 5.862E-05 | global batch size:    80 | lm loss: 6.352011E+00 | loss scale: 2048.0 | grad norm: 57730.647 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6450/  159576 | consumed samples:       212800 | elapsed time per iteration (ms): 17363.4 | learning rate: 5.884E-05 | global batch size:    80 | lm loss: 6.338916E+00 | loss scale: 2048.0 | grad norm: 74089.414 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6460/  159576 | consumed samples:       213600 | elapsed time per iteration (ms): 17402.1 | learning rate: 5.906E-05 | global batch size:    80 | lm loss: 6.307239E+00 | loss scale: 2048.0 | grad norm: 43748.712 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6470/  159576 | consumed samples:       214400 | elapsed time per iteration (ms): 17495.0 | learning rate: 5.929E-05 | global batch size:    80 | lm loss: 6.336151E+00 | loss scale: 2048.0 | grad norm: 39508.293 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6480/  159576 | consumed samples:       215200 | elapsed time per iteration (ms): 17462.6 | learning rate: 5.951E-05 | global batch size:    80 | lm loss: 6.356039E+00 | loss scale: 2048.0 | grad norm: 37602.564 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6490/  159576 | consumed samples:       216000 | elapsed time per iteration (ms): 17419.0 | learning rate: 5.973E-05 | global batch size:    80 | lm loss: 6.355389E+00 | loss scale: 2048.0 | grad norm: 44833.008 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6500/  159576 | consumed samples:       216800 | elapsed time per iteration (ms): 17489.2 | learning rate: 5.995E-05 | global batch size:    80 | lm loss: 6.336482E+00 | loss scale: 2048.0 | grad norm: 54162.793 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6510/  159576 | consumed samples:       217600 | elapsed time per iteration (ms): 17458.7 | learning rate: 6.000E-05 | global batch size:    80 | lm loss: 6.337574E+00 | loss scale: 2048.0 | grad norm: 54595.463 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6520/  159576 | consumed samples:       218400 | elapsed time per iteration (ms): 17515.2 | learning rate: 6.000E-05 | global batch size:    80 | lm loss: 6.356417E+00 | loss scale: 2048.0 | grad norm: 49879.304 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6530/  159576 | consumed samples:       219200 | elapsed time per iteration (ms): 17447.6 | learning rate: 6.000E-05 | global batch size:    80 | lm loss: 6.369381E+00 | loss scale: 2048.0 | grad norm: 60963.731 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6540/  159576 | consumed samples:       220000 | elapsed time per iteration (ms): 17448.8 | learning rate: 6.000E-05 | global batch size:    80 | lm loss: 6.338880E+00 | loss scale: 2048.0 | grad norm: 59382.431 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6550/  159576 | consumed samples:       220800 | elapsed time per iteration (ms): 17544.1 | learning rate: 6.000E-05 | global batch size:    80 | lm loss: 6.331310E+00 | loss scale: 2048.0 | grad norm: 62265.638 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-25 06:08:34] PULSE: tr8-104B is waiting for the previous Job Array job to finish before scheduling a new one (1185639_[2-10%1] on 'gpu_p13' partition)
[2021-09-25 06:08:34] PULSE: tr8-104B is running for 1:41:33 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0)
 iteration     6560/  159576 | consumed samples:       221600 | elapsed time per iteration (ms): 17470.3 | learning rate: 6.000E-05 | global batch size:    80 | lm loss: 6.312242E+00 | loss scale: 2048.0 | grad norm: 58830.808 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6570/  159576 | consumed samples:       222400 | elapsed time per iteration (ms): 17497.8 | learning rate: 6.000E-05 | global batch size:    80 | lm loss: 6.305868E+00 | loss scale: 2048.0 | grad norm: 95845.470 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6580/  159576 | consumed samples:       223200 | elapsed time per iteration (ms): 17465.4 | learning rate: 6.000E-05 | global batch size:    80 | lm loss: 6.323441E+00 | loss scale: 2048.0 | grad norm: 67257.778 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6590/  159576 | consumed samples:       224000 | elapsed time per iteration (ms): 17539.4 | learning rate: 6.000E-05 | global batch size:    80 | lm loss: 6.324122E+00 | loss scale: 2048.0 | grad norm: 68019.685 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6600/  159576 | consumed samples:       224800 | elapsed time per iteration (ms): 17523.7 | learning rate: 6.000E-05 | global batch size:    80 | lm loss: 6.367977E+00 | loss scale: 2048.0 | grad norm: 72056.426 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6610/  159576 | consumed samples:       225600 | elapsed time per iteration (ms): 17492.9 | learning rate: 6.000E-05 | global batch size:    80 | lm loss: 6.308113E+00 | loss scale: 2048.0 | grad norm: 149731.321 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6620/  159576 | consumed samples:       226400 | elapsed time per iteration (ms): 17537.3 | learning rate: 6.000E-05 | global batch size:    80 | lm loss: 6.354418E+00 | loss scale: 2048.0 | grad norm: 62412.313 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6630/  159576 | consumed samples:       227200 | elapsed time per iteration (ms): 17517.5 | learning rate: 6.000E-05 | global batch size:    80 | lm loss: 6.357222E+00 | loss scale: 2048.0 | grad norm: 85289.584 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6640/  159576 | consumed samples:       228000 | elapsed time per iteration (ms): 17515.1 | learning rate: 6.000E-05 | global batch size:    80 | lm loss: 6.340989E+00 | loss scale: 2048.0 | grad norm: 56974.928 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6650/  159576 | consumed samples:       228800 | elapsed time per iteration (ms): 17504.4 | learning rate: 6.000E-05 | global batch size:    80 | lm loss: 6.343948E+00 | loss scale: 2048.0 | grad norm: 94205.551 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6660/  159576 | consumed samples:       229600 | elapsed time per iteration (ms): 17528.5 | learning rate: 6.000E-05 | global batch size:    80 | lm loss: 6.349052E+00 | loss scale: 2048.0 | grad norm: 59116.810 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6670/  159576 | consumed samples:       230400 | elapsed time per iteration (ms): 17539.0 | learning rate: 6.000E-05 | global batch size:    80 | lm loss: 6.319823E+00 | loss scale: 2048.0 | grad norm: 89145.444 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6680/  159576 | consumed samples:       231200 | elapsed time per iteration (ms): 17492.6 | learning rate: 6.000E-05 | global batch size:    80 | lm loss: 6.322467E+00 | loss scale: 2048.0 | grad norm: 79513.773 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6690/  159576 | consumed samples:       232000 | elapsed time per iteration (ms): 17427.8 | learning rate: 6.000E-05 | global batch size:    80 | lm loss: 6.351400E+00 | loss scale: 2048.0 | grad norm: 80270.152 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6700/  159576 | consumed samples:       232800 | elapsed time per iteration (ms): 17427.9 | learning rate: 6.000E-05 | global batch size:    80 | lm loss: 6.321815E+00 | loss scale: 2048.0 | grad norm: 89875.557 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6710/  159576 | consumed samples:       233600 | elapsed time per iteration (ms): 17478.2 | learning rate: 6.000E-05 | global batch size:    80 | lm loss: 6.318744E+00 | loss scale: 2048.0 | grad norm: 75317.404 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-25 06:55:50] PULSE: tr8-104B is scheduled to start in 1 day, 10:16:13 (at 2021-09-26T17:12:04) (1188168 on 'gpu_p13' partition)
[2021-09-25 06:55:50] PULSE: tr8-104B is waiting for the previous Job Array job to finish before scheduling a new one (1185639_[2-10%1] on 'gpu_p13' partition)
[2021-09-25 06:55:50] PULSE: tr8-104B is running for 2:28:49 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0)
 iteration     6720/  159576 | consumed samples:       234400 | elapsed time per iteration (ms): 17509.5 | learning rate: 6.000E-05 | global batch size:    80 | lm loss: 6.297193E+00 | loss scale: 2048.0 | grad norm: 136372.702 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6730/  159576 | consumed samples:       235200 | elapsed time per iteration (ms): 17514.2 | learning rate: 6.000E-05 | global batch size:    80 | lm loss: 6.303332E+00 | loss scale: 2048.0 | grad norm: 84302.661 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6740/  159576 | consumed samples:       236000 | elapsed time per iteration (ms): 17530.2 | learning rate: 6.000E-05 | global batch size:    80 | lm loss: 6.327809E+00 | loss scale: 2048.0 | grad norm: 84736.807 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6750/  159576 | consumed samples:       236912 | elapsed time per iteration (ms): 18323.3 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 6.320579E+00 | loss scale: 2048.0 | grad norm: 68855.991 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-25 07:08:59] PULSE: tr8-104B is scheduled to start in 19:13:17 (at 2021-09-26T02:22:17) (1188168 on 'gpu_p13' partition)
[2021-09-25 07:08:59] PULSE: tr8-104B is waiting for the previous Job Array job to finish before scheduling a new one (1185639_[2-10%1] on 'gpu_p13' partition)
[2021-09-25 07:08:59] PULSE: tr8-104B is running for 2:41:58 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0)
 iteration     6760/  159576 | consumed samples:       237872 | elapsed time per iteration (ms): 18776.3 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 6.303013E+00 | loss scale: 2048.0 | grad norm: 69740.116 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6770/  159576 | consumed samples:       238832 | elapsed time per iteration (ms): 18675.5 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 6.319376E+00 | loss scale: 2048.0 | grad norm: 83900.872 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6780/  159576 | consumed samples:       239792 | elapsed time per iteration (ms): 18605.9 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 6.336406E+00 | loss scale: 2048.0 | grad norm: 62443.554 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6790/  159576 | consumed samples:       240752 | elapsed time per iteration (ms): 18746.1 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 6.333478E+00 | loss scale: 2048.0 | grad norm: 73606.128 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6800/  159576 | consumed samples:       241712 | elapsed time per iteration (ms): 18688.5 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 6.336754E+00 | loss scale: 2048.0 | grad norm: 96323.491 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6810/  159576 | consumed samples:       242672 | elapsed time per iteration (ms): 18568.8 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 6.315503E+00 | loss scale: 2048.0 | grad norm: 65008.365 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6820/  159576 | consumed samples:       243632 | elapsed time per iteration (ms): 18731.9 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 6.301308E+00 | loss scale: 2048.0 | grad norm: 70887.665 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6830/  159576 | consumed samples:       244592 | elapsed time per iteration (ms): 18612.7 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 6.331754E+00 | loss scale: 2048.0 | grad norm: 78393.887 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6840/  159576 | consumed samples:       245552 | elapsed time per iteration (ms): 18584.4 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 6.318947E+00 | loss scale: 4096.0 | grad norm: 175812.475 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6850/  159576 | consumed samples:       246512 | elapsed time per iteration (ms): 18855.7 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 6.349559E+00 | loss scale: 4096.0 | grad norm: 150858.899 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6860/  159576 | consumed samples:       247472 | elapsed time per iteration (ms): 18778.5 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 6.341676E+00 | loss scale: 4096.0 | grad norm: 374400.560 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6870/  159576 | consumed samples:       248432 | elapsed time per iteration (ms): 18648.3 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 6.313033E+00 | loss scale: 4096.0 | grad norm: 153615.195 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6880/  159576 | consumed samples:       249392 | elapsed time per iteration (ms): 18783.0 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 6.332200E+00 | loss scale: 4096.0 | grad norm: 135045.488 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6890/  159576 | consumed samples:       250352 | elapsed time per iteration (ms): 18757.2 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 6.370442E+00 | loss scale: 4096.0 | grad norm: 140003.151 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6900/  159576 | consumed samples:       251312 | elapsed time per iteration (ms): 18547.7 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 6.426891E+00 | loss scale: 4096.0 | grad norm: 166603.752 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6910/  159576 | consumed samples:       252272 | elapsed time per iteration (ms): 18775.5 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 6.383529E+00 | loss scale: 4096.0 | grad norm: 161102.692 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6920/  159576 | consumed samples:       253232 | elapsed time per iteration (ms): 18674.9 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 6.362777E+00 | loss scale: 4096.0 | grad norm: 135239.756 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6930/  159576 | consumed samples:       254192 | elapsed time per iteration (ms): 18723.1 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 6.348313E+00 | loss scale: 4096.0 | grad norm: 180298.634 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6940/  159576 | consumed samples:       255152 | elapsed time per iteration (ms): 18629.7 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 6.304693E+00 | loss scale: 4096.0 | grad norm: 155481.632 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6950/  159576 | consumed samples:       256112 | elapsed time per iteration (ms): 18736.2 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 6.335081E+00 | loss scale: 4096.0 | grad norm: 170157.683 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-25 08:09:15] PULSE: tr8-104B is scheduled to start in 18:13:01 (at 2021-09-26T02:22:17) (1188168 on 'gpu_p13' partition)
[2021-09-25 08:09:15] PULSE: tr8-104B is waiting for the previous Job Array job to finish before scheduling a new one (1185639_[2-10%1] on 'gpu_p13' partition)
[2021-09-25 08:09:15] PULSE: tr8-104B is running for 3:42:14 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0)
 iteration     6960/  159576 | consumed samples:       257072 | elapsed time per iteration (ms): 18679.3 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 6.350162E+00 | loss scale: 4096.0 | grad norm: 146048.789 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6970/  159576 | consumed samples:       258032 | elapsed time per iteration (ms): 17405.9 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 6.358824E+00 | loss scale: 2048.0 | grad norm: 83822.155 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6980/  159576 | consumed samples:       258992 | elapsed time per iteration (ms): 18714.5 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 6.327154E+00 | loss scale: 2048.0 | grad norm: 55012.450 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6990/  159576 | consumed samples:       259952 | elapsed time per iteration (ms): 18649.4 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 6.344659E+00 | loss scale: 2048.0 | grad norm: 62132.618 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7000/  159576 | consumed samples:       260912 | elapsed time per iteration (ms): 18706.1 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 6.444662E+00 | loss scale: 2048.0 | grad norm: 98258.265 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
------------------------------------------------------------------------------------------------
 validation loss at iteration 7000 | lm loss value: 7.174200E+00 | lm loss PPL: 1.305315E+03 | 
------------------------------------------------------------------------------------------------
 iteration     7010/  159576 | consumed samples:       261872 | elapsed time per iteration (ms): 19904.0 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 1.142026E+01 | loss scale: 2048.0 | grad norm: 219645.978 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7020/  159576 | consumed samples:       262832 | elapsed time per iteration (ms): 18580.7 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 1.367010E+01 | loss scale: 2048.0 | grad norm: 223286.170 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-25 08:32:28] PULSE: tr8-104B is scheduled to start in 17:49:48 (at 2021-09-26T02:22:17) (1188168 on 'gpu_p13' partition)
[2021-09-25 08:32:28] PULSE: tr8-104B is waiting for the previous Job Array job to finish before scheduling a new one (1185639_[2-10%1] on 'gpu_p13' partition)
[2021-09-25 08:32:28] PULSE: tr8-104B is running for 4:05:27 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0)
 iteration     7030/  159576 | consumed samples:       263792 | elapsed time per iteration (ms): 18402.7 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 1.182180E+01 | loss scale: 2048.0 | grad norm: 19931.456 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7040/  159576 | consumed samples:       264752 | elapsed time per iteration (ms): 18461.7 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 9.981701E+00 | loss scale: 2048.0 | grad norm: 205737.088 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7050/  159576 | consumed samples:       265712 | elapsed time per iteration (ms): 18431.2 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 9.425107E+00 | loss scale: 2048.0 | grad norm: 195793.297 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7060/  159576 | consumed samples:       266672 | elapsed time per iteration (ms): 18498.9 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 8.606621E+00 | loss scale: 2048.0 | grad norm: 50379.603 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7070/  159576 | consumed samples:       267632 | elapsed time per iteration (ms): 18340.3 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 8.027315E+00 | loss scale: 2048.0 | grad norm: 37173.058 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7080/  159576 | consumed samples:       268592 | elapsed time per iteration (ms): 18563.4 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 7.726066E+00 | loss scale: 2048.0 | grad norm: 22946.689 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7090/  159576 | consumed samples:       269552 | elapsed time per iteration (ms): 18408.0 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 7.553810E+00 | loss scale: 2048.0 | grad norm: 16048.807 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7100/  159576 | consumed samples:       270512 | elapsed time per iteration (ms): 18353.7 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 7.394469E+00 | loss scale: 2048.0 | grad norm: 10766.157 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-25 08:57:55] PULSE: tr8-104B is scheduled to start in 17:24:21 (at 2021-09-26T02:22:17) (1188168 on 'gpu_p13' partition)
[2021-09-25 08:57:55] PULSE: tr8-104B is waiting for the previous Job Array job to finish before scheduling a new one (1185639_[2-10%1] on 'gpu_p13' partition)
[2021-09-25 08:57:55] PULSE: tr8-104B is running for 4:30:54 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0)
 iteration     7110/  159576 | consumed samples:       271472 | elapsed time per iteration (ms): 18511.6 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 7.327065E+00 | loss scale: 2048.0 | grad norm: 25940.869 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7120/  159576 | consumed samples:       272432 | elapsed time per iteration (ms): 18333.5 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 7.337917E+00 | loss scale: 2048.0 | grad norm: 18319.505 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7130/  159576 | consumed samples:       273392 | elapsed time per iteration (ms): 18249.8 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 7.273988E+00 | loss scale: 2048.0 | grad norm: 14331.807 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7140/  159576 | consumed samples:       274352 | elapsed time per iteration (ms): 18274.7 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 7.204887E+00 | loss scale: 2048.0 | grad norm: 21767.712 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-25 09:09:21] PULSE: tr8-104B is scheduled to start in 17:12:55 (at 2021-09-26T02:22:17) (1188168 on 'gpu_p13' partition)
[2021-09-25 09:09:21] PULSE: tr8-104B is waiting for the previous Job Array job to finish before scheduling a new one (1185639_[2-10%1] on 'gpu_p13' partition)
[2021-09-25 09:09:21] PULSE: tr8-104B is running for 4:42:20 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0)
 iteration     7150/  159576 | consumed samples:       275312 | elapsed time per iteration (ms): 18318.7 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 7.195872E+00 | loss scale: 2048.0 | grad norm: 14010.173 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7160/  159576 | consumed samples:       276272 | elapsed time per iteration (ms): 18337.2 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 7.136990E+00 | loss scale: 2048.0 | grad norm: 23189.415 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7170/  159576 | consumed samples:       277232 | elapsed time per iteration (ms): 18344.7 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 7.222323E+00 | loss scale: 2048.0 | grad norm: 22610.297 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7180/  159576 | consumed samples:       278192 | elapsed time per iteration (ms): 18312.6 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 7.156533E+00 | loss scale: 2048.0 | grad norm: 12376.987 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7190/  159576 | consumed samples:       279152 | elapsed time per iteration (ms): 18417.7 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 7.084262E+00 | loss scale: 2048.0 | grad norm: 38647.390 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7200/  159576 | consumed samples:       280112 | elapsed time per iteration (ms): 18396.8 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 7.110893E+00 | loss scale: 2048.0 | grad norm: 21520.416 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7210/  159576 | consumed samples:       281072 | elapsed time per iteration (ms): 18408.8 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 7.294872E+00 | loss scale: 2048.0 | grad norm: 77171.242 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7220/  159576 | consumed samples:       282032 | elapsed time per iteration (ms): 18333.4 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 7.155109E+00 | loss scale: 2048.0 | grad norm: 16921.991 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7230/  159576 | consumed samples:       282992 | elapsed time per iteration (ms): 18398.5 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 7.042103E+00 | loss scale: 2048.0 | grad norm: 13510.423 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7240/  159576 | consumed samples:       284032 | elapsed time per iteration (ms): 19100.0 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 6.964984E+00 | loss scale: 2048.0 | grad norm: 11355.587 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7250/  159576 | consumed samples:       285152 | elapsed time per iteration (ms): 19781.1 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 7.051522E+00 | loss scale: 2048.0 | grad norm: 14836.710 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7260/  159576 | consumed samples:       286272 | elapsed time per iteration (ms): 19836.2 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 7.050404E+00 | loss scale: 2048.0 | grad norm: 32092.591 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7270/  159576 | consumed samples:       287392 | elapsed time per iteration (ms): 19719.8 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 7.034865E+00 | loss scale: 2048.0 | grad norm: 25809.031 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7280/  159576 | consumed samples:       288512 | elapsed time per iteration (ms): 19632.8 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 7.038512E+00 | loss scale: 2048.0 | grad norm: 19816.017 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7290/  159576 | consumed samples:       289632 | elapsed time per iteration (ms): 19704.6 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 7.051814E+00 | loss scale: 2048.0 | grad norm: 13138.906 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7300/  159576 | consumed samples:       290752 | elapsed time per iteration (ms): 19431.1 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 6.962708E+00 | loss scale: 2048.0 | grad norm: 15505.241 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7310/  159576 | consumed samples:       291872 | elapsed time per iteration (ms): 19625.1 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 7.068867E+00 | loss scale: 2048.0 | grad norm: 26542.834 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7320/  159576 | consumed samples:       292992 | elapsed time per iteration (ms): 19705.6 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 7.131171E+00 | loss scale: 2048.0 | grad norm: 59185.721 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7330/  159576 | consumed samples:       294112 | elapsed time per iteration (ms): 19592.0 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 7.030576E+00 | loss scale: 2048.0 | grad norm: 32033.660 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-25 10:09:39] PULSE: tr8-104B is scheduled to start in 17:07:05 (at 2021-09-26T03:16:45) (1188168 on 'gpu_p13' partition)
[2021-09-25 10:09:39] PULSE: tr8-104B is waiting for the previous Job Array job to finish before scheduling a new one (1185639_[2-10%1] on 'gpu_p13' partition)
[2021-09-25 10:09:39] PULSE: tr8-104B is running for 5:42:38 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0)
 iteration     7340/  159576 | consumed samples:       295232 | elapsed time per iteration (ms): 19566.4 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 6.981178E+00 | loss scale: 2048.0 | grad norm: 29317.971 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7350/  159576 | consumed samples:       296352 | elapsed time per iteration (ms): 19494.3 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 6.969751E+00 | loss scale: 2048.0 | grad norm: 20774.916 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7360/  159576 | consumed samples:       297472 | elapsed time per iteration (ms): 19789.2 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 6.939532E+00 | loss scale: 2048.0 | grad norm: 22939.531 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7370/  159576 | consumed samples:       298592 | elapsed time per iteration (ms): 19854.7 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 6.888672E+00 | loss scale: 2048.0 | grad norm: 30762.881 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7380/  159576 | consumed samples:       299712 | elapsed time per iteration (ms): 19888.4 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 6.906486E+00 | loss scale: 2048.0 | grad norm: 18438.642 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7390/  159576 | consumed samples:       300832 | elapsed time per iteration (ms): 19703.2 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 6.877617E+00 | loss scale: 2048.0 | grad norm: 15185.355 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7400/  159576 | consumed samples:       301952 | elapsed time per iteration (ms): 19654.2 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 6.854189E+00 | loss scale: 2048.0 | grad norm: 15960.831 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7410/  159576 | consumed samples:       303072 | elapsed time per iteration (ms): 19528.4 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 6.894382E+00 | loss scale: 2048.0 | grad norm: 12842.484 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7420/  159576 | consumed samples:       304192 | elapsed time per iteration (ms): 19701.7 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 6.860787E+00 | loss scale: 2048.0 | grad norm: 15167.024 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7430/  159576 | consumed samples:       305312 | elapsed time per iteration (ms): 19702.1 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 6.859363E+00 | loss scale: 2048.0 | grad norm: 23062.497 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7440/  159576 | consumed samples:       306432 | elapsed time per iteration (ms): 19933.7 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 6.860333E+00 | loss scale: 2048.0 | grad norm: 32840.662 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7450/  159576 | consumed samples:       307552 | elapsed time per iteration (ms): 19857.9 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 6.824039E+00 | loss scale: 2048.0 | grad norm: 14512.315 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7460/  159576 | consumed samples:       308672 | elapsed time per iteration (ms): 19438.9 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 6.828743E+00 | loss scale: 2048.0 | grad norm: 22065.697 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7470/  159576 | consumed samples:       309792 | elapsed time per iteration (ms): 19647.3 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 6.799754E+00 | loss scale: 4096.0 | grad norm: 49640.058 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7480/  159576 | consumed samples:       310912 | elapsed time per iteration (ms): 19818.5 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 6.815539E+00 | loss scale: 4096.0 | grad norm: 22148.104 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7490/  159576 | consumed samples:       312032 | elapsed time per iteration (ms): 19788.8 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 6.894387E+00 | loss scale: 4096.0 | grad norm: 36912.117 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7500/  159576 | consumed samples:       313152 | elapsed time per iteration (ms): 19799.3 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 6.841101E+00 | loss scale: 4096.0 | grad norm: 23983.193 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
saving checkpoint at iteration    7500 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints
[2021-09-25 11:03:46,249] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/global_step7500/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration    7500 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints
time (ms) | save-checkpoint: 18021.67
 iteration     7510/  159576 | consumed samples:       314272 | elapsed time per iteration (ms): 21444.7 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 6.821138E+00 | loss scale: 4096.0 | grad norm: 27340.598 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-25 11:09:42] PULSE: tr8-104B is scheduled to start in 17:10:43 (at 2021-09-26T04:20:26) (1188168 on 'gpu_p13' partition)
[2021-09-25 11:09:42] PULSE: tr8-104B is waiting for the previous Job Array job to finish before scheduling a new one (1185639_[2-10%1] on 'gpu_p13' partition)
[2021-09-25 11:09:42] PULSE: tr8-104B is running for 6:42:41 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0)
 iteration     7520/  159576 | consumed samples:       315392 | elapsed time per iteration (ms): 19669.6 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 6.839085E+00 | loss scale: 4096.0 | grad norm: 27168.782 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7530/  159576 | consumed samples:       316512 | elapsed time per iteration (ms): 19673.9 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 6.866766E+00 | loss scale: 4096.0 | grad norm: 35661.716 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7540/  159576 | consumed samples:       317632 | elapsed time per iteration (ms): 19547.7 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 6.895227E+00 | loss scale: 4096.0 | grad norm: 30950.102 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7550/  159576 | consumed samples:       318752 | elapsed time per iteration (ms): 19728.4 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 6.974333E+00 | loss scale: 4096.0 | grad norm: 58146.349 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7560/  159576 | consumed samples:       319872 | elapsed time per iteration (ms): 19670.9 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 6.993269E+00 | loss scale: 4096.0 | grad norm: 59358.983 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7570/  159576 | consumed samples:       320992 | elapsed time per iteration (ms): 19932.4 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 7.018776E+00 | loss scale: 4096.0 | grad norm: 26693.574 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7580/  159576 | consumed samples:       322112 | elapsed time per iteration (ms): 19801.6 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 6.954316E+00 | loss scale: 4096.0 | grad norm: 56910.600 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7590/  159576 | consumed samples:       323232 | elapsed time per iteration (ms): 19757.6 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 7.019042E+00 | loss scale: 4096.0 | grad norm: 31511.156 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7600/  159576 | consumed samples:       324352 | elapsed time per iteration (ms): 19717.1 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 7.002568E+00 | loss scale: 4096.0 | grad norm: 35214.039 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7610/  159576 | consumed samples:       325472 | elapsed time per iteration (ms): 19801.0 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 6.968073E+00 | loss scale: 4096.0 | grad norm: 40886.049 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7620/  159576 | consumed samples:       326592 | elapsed time per iteration (ms): 19491.3 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 6.959355E+00 | loss scale: 4096.0 | grad norm: 37865.294 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7630/  159576 | consumed samples:       327712 | elapsed time per iteration (ms): 19606.0 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 6.927076E+00 | loss scale: 4096.0 | grad norm: 32908.139 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7640/  159576 | consumed samples:       328832 | elapsed time per iteration (ms): 19669.6 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 7.079063E+00 | loss scale: 4096.0 | grad norm: 43561.929 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7650/  159576 | consumed samples:       329952 | elapsed time per iteration (ms): 19813.3 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 6.977676E+00 | loss scale: 4096.0 | grad norm: 33954.223 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7660/  159576 | consumed samples:       331120 | elapsed time per iteration (ms): 20182.2 | learning rate: 6.000E-05 | global batch size:   128 | lm loss: 7.071407E+00 | loss scale: 4096.0 | grad norm: 139629.093 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7670/  159576 | consumed samples:       332400 | elapsed time per iteration (ms): 20921.2 | learning rate: 6.000E-05 | global batch size:   128 | lm loss: 7.133433E+00 | loss scale: 4096.0 | grad norm: 151598.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7680/  159576 | consumed samples:       333680 | elapsed time per iteration (ms): 20923.7 | learning rate: 6.000E-05 | global batch size:   128 | lm loss: 7.093058E+00 | loss scale: 4096.0 | grad norm: 75854.068 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7690/  159576 | consumed samples:       334960 | elapsed time per iteration (ms): 20468.2 | learning rate: 6.000E-05 | global batch size:   128 | lm loss: 7.040206E+00 | loss scale: 4096.0 | grad norm: 68735.463 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-25 12:10:01] PULSE: tr8-104B is scheduled to start in 18:54:29 (at 2021-09-26T07:04:31) (1188168 on 'gpu_p13' partition)
[2021-09-25 12:10:01] PULSE: tr8-104B is waiting for the previous Job Array job to finish before scheduling a new one (1185639_[2-10%1] on 'gpu_p13' partition)
[2021-09-25 12:10:01] PULSE: tr8-104B is running for 7:43:00 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0)
 iteration     7700/  159576 | consumed samples:       336240 | elapsed time per iteration (ms): 20712.9 | learning rate: 6.000E-05 | global batch size:   128 | lm loss: 6.991071E+00 | loss scale: 4096.0 | grad norm: 49058.974 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7710/  159576 | consumed samples:       337520 | elapsed time per iteration (ms): 20803.8 | learning rate: 6.000E-05 | global batch size:   128 | lm loss: 6.999660E+00 | loss scale: 4096.0 | grad norm: 50810.796 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7720/  159576 | consumed samples:       338800 | elapsed time per iteration (ms): 21027.6 | learning rate: 6.000E-05 | global batch size:   128 | lm loss: 7.148920E+00 | loss scale: 4096.0 | grad norm: 34526.386 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7730/  159576 | consumed samples:       340080 | elapsed time per iteration (ms): 20621.1 | learning rate: 6.000E-05 | global batch size:   128 | lm loss: 6.952879E+00 | loss scale: 4096.0 | grad norm: 46587.607 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7740/  159576 | consumed samples:       341360 | elapsed time per iteration (ms): 20787.7 | learning rate: 6.000E-05 | global batch size:   128 | lm loss: 7.077150E+00 | loss scale: 4096.0 | grad norm: 53834.886 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7750/  159576 | consumed samples:       342640 | elapsed time per iteration (ms): 20790.5 | learning rate: 6.000E-05 | global batch size:   128 | lm loss: 7.024051E+00 | loss scale: 4096.0 | grad norm: 108296.631 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7760/  159576 | consumed samples:       343920 | elapsed time per iteration (ms): 20756.3 | learning rate: 6.000E-05 | global batch size:   128 | lm loss: 7.185934E+00 | loss scale: 4096.0 | grad norm: 40243.918 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7770/  159576 | consumed samples:       345200 | elapsed time per iteration (ms): 20678.9 | learning rate: 6.000E-05 | global batch size:   128 | lm loss: 7.155985E+00 | loss scale: 4096.0 | grad norm: 45818.733 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7780/  159576 | consumed samples:       346480 | elapsed time per iteration (ms): 20656.6 | learning rate: 6.000E-05 | global batch size:   128 | lm loss: 7.028696E+00 | loss scale: 4096.0 | grad norm: 54814.681 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7790/  159576 | consumed samples:       347760 | elapsed time per iteration (ms): 20773.2 | learning rate: 6.000E-05 | global batch size:   128 | lm loss: 6.962093E+00 | loss scale: 4096.0 | grad norm: 57105.334 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7800/  159576 | consumed samples:       349040 | elapsed time per iteration (ms): 20735.7 | learning rate: 6.000E-05 | global batch size:   128 | lm loss: 7.054767E+00 | loss scale: 4096.0 | grad norm: 74767.367 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7810/  159576 | consumed samples:       350320 | elapsed time per iteration (ms): 20748.9 | learning rate: 6.000E-05 | global batch size:   128 | lm loss: 6.948767E+00 | loss scale: 4096.0 | grad norm: 103822.696 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7820/  159576 | consumed samples:       351600 | elapsed time per iteration (ms): 20609.0 | learning rate: 6.000E-05 | global batch size:   128 | lm loss: 6.995116E+00 | loss scale: 4096.0 | grad norm: 70594.913 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7830/  159576 | consumed samples:       352880 | elapsed time per iteration (ms): 20891.2 | learning rate: 6.000E-05 | global batch size:   128 | lm loss: 7.140380E+00 | loss scale: 4096.0 | grad norm: 50257.684 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7840/  159576 | consumed samples:       354160 | elapsed time per iteration (ms): 20736.5 | learning rate: 6.000E-05 | global batch size:   128 | lm loss: 7.051595E+00 | loss scale: 4096.0 | grad norm: 62967.110 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7850/  159576 | consumed samples:       355440 | elapsed time per iteration (ms): 20790.1 | learning rate: 6.000E-05 | global batch size:   128 | lm loss: 6.921895E+00 | loss scale: 4096.0 | grad norm: 104168.914 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7860/  159576 | consumed samples:       356720 | elapsed time per iteration (ms): 20774.7 | learning rate: 6.000E-05 | global batch size:   128 | lm loss: 7.071528E+00 | loss scale: 4096.0 | grad norm: 193610.451 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7870/  159576 | consumed samples:       358000 | elapsed time per iteration (ms): 20837.0 | learning rate: 6.000E-05 | global batch size:   128 | lm loss: 7.086633E+00 | loss scale: 4096.0 | grad norm: 56330.990 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-25 13:10:06] PULSE: tr8-104B is scheduled to start in 17:54:24 (at 2021-09-26T07:04:31) (1188168 on 'gpu_p13' partition)
[2021-09-25 13:10:06] PULSE: tr8-104B is waiting for the previous Job Array job to finish before scheduling a new one (1185639_[2-10%1] on 'gpu_p13' partition)
[2021-09-25 13:10:06] PULSE: tr8-104B is running for 8:43:05 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0)
 iteration     7880/  159576 | consumed samples:       359280 | elapsed time per iteration (ms): 20746.8 | learning rate: 6.000E-05 | global batch size:   128 | lm loss: 7.156522E+00 | loss scale: 4096.0 | grad norm: 137295.607 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7890/  159576 | consumed samples:       360560 | elapsed time per iteration (ms): 20983.7 | learning rate: 6.000E-05 | global batch size:   128 | lm loss: 6.996352E+00 | loss scale: 4096.0 | grad norm: 67763.557 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7900/  159576 | consumed samples:       361840 | elapsed time per iteration (ms): 20640.0 | learning rate: 6.000E-05 | global batch size:   128 | lm loss: 6.985654E+00 | loss scale: 4096.0 | grad norm: 113013.123 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7910/  159576 | consumed samples:       363120 | elapsed time per iteration (ms): 20742.0 | learning rate: 6.000E-05 | global batch size:   128 | lm loss: 6.976338E+00 | loss scale: 4096.0 | grad norm: 73140.648 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7920/  159576 | consumed samples:       364400 | elapsed time per iteration (ms): 20679.4 | learning rate: 6.000E-05 | global batch size:   128 | lm loss: 6.917073E+00 | loss scale: 4096.0 | grad norm: 83861.566 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7930/  159576 | consumed samples:       365680 | elapsed time per iteration (ms): 20531.8 | learning rate: 6.000E-05 | global batch size:   128 | lm loss: 6.971965E+00 | loss scale: 4096.0 | grad norm: 57978.154 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7940/  159576 | consumed samples:       366960 | elapsed time per iteration (ms): 20446.7 | learning rate: 6.000E-05 | global batch size:   128 | lm loss: 7.117603E+00 | loss scale: 4096.0 | grad norm: 218144.909 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7950/  159576 | consumed samples:       368240 | elapsed time per iteration (ms): 20823.5 | learning rate: 6.000E-05 | global batch size:   128 | lm loss: 7.029739E+00 | loss scale: 4096.0 | grad norm: 46987.640 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7960/  159576 | consumed samples:       369520 | elapsed time per iteration (ms): 20775.8 | learning rate: 6.000E-05 | global batch size:   128 | lm loss: 6.972835E+00 | loss scale: 4096.0 | grad norm: 59193.517 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7970/  159576 | consumed samples:       370800 | elapsed time per iteration (ms): 20508.8 | learning rate: 6.000E-05 | global batch size:   128 | lm loss: 6.890491E+00 | loss scale: 8192.0 | grad norm: 102786.418 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7980/  159576 | consumed samples:       372080 | elapsed time per iteration (ms): 20983.1 | learning rate: 6.000E-05 | global batch size:   128 | lm loss: 6.927078E+00 | loss scale: 8192.0 | grad norm: 117997.551 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7990/  159576 | consumed samples:       373360 | elapsed time per iteration (ms): 20495.7 | learning rate: 6.000E-05 | global batch size:   128 | lm loss: 6.823578E+00 | loss scale: 8192.0 | grad norm: 123947.033 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-25 13:53:58,625] [INFO] [logging.py:68:log_dist] [Rank 0] step=8000, skipped=17, lr=[5.999979430007177e-05, 5.999979430007177e-05], mom=[(0.9, 0.999), (0.9, 0.999)]
steps: 8000 loss: 6.8207 iter time (s): 0.010 samples/sec: 13060.948
 iteration     8000/  159576 | consumed samples:       374640 | elapsed time per iteration (ms): 20659.2 | learning rate: 6.000E-05 | global batch size:   128 | lm loss: 6.884979E+00 | loss scale: 8192.0 | grad norm: 131468.178 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
------------------------------------------------------------------------------------------------
 validation loss at iteration 8000 | lm loss value: 6.791678E+00 | lm loss PPL: 8.904064E+02 | 
------------------------------------------------------------------------------------------------
 iteration     8010/  159576 | consumed samples:       375920 | elapsed time per iteration (ms): 22008.9 | learning rate: 6.000E-05 | global batch size:   128 | lm loss: 6.826038E+00 | loss scale: 8192.0 | grad norm: 154245.241 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8020/  159576 | consumed samples:       377200 | elapsed time per iteration (ms): 20587.9 | learning rate: 6.000E-05 | global batch size:   128 | lm loss: 6.870419E+00 | loss scale: 8192.0 | grad norm: 129858.542 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8030/  159576 | consumed samples:       378544 | elapsed time per iteration (ms): 21288.4 | learning rate: 6.000E-05 | global batch size:   144 | lm loss: 6.928481E+00 | loss scale: 8192.0 | grad norm: 226677.481 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8040/  159576 | consumed samples:       379984 | elapsed time per iteration (ms): 21881.6 | learning rate: 6.000E-05 | global batch size:   144 | lm loss: 6.896291E+00 | loss scale: 8192.0 | grad norm: 205623.823 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-25 14:10:08] PULSE: tr8-104B is scheduled to start in 17:26:04 (at 2021-09-26T07:36:13) (1188168 on 'gpu_p13' partition)
[2021-09-25 14:10:08] PULSE: tr8-104B is waiting for the previous Job Array job to finish before scheduling a new one (1185639_[2-10%1] on 'gpu_p13' partition)
[2021-09-25 14:10:08] PULSE: tr8-104B is running for 9:43:07 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0)
 iteration     8050/  159576 | consumed samples:       381424 | elapsed time per iteration (ms): 21696.5 | learning rate: 6.000E-05 | global batch size:   144 | lm loss: 6.873873E+00 | loss scale: 8192.0 | grad norm: 146153.031 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8060/  159576 | consumed samples:       382864 | elapsed time per iteration (ms): 21810.7 | learning rate: 6.000E-05 | global batch size:   144 | lm loss: 6.853185E+00 | loss scale: 8192.0 | grad norm: 101607.158 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8070/  159576 | consumed samples:       384304 | elapsed time per iteration (ms): 21802.4 | learning rate: 6.000E-05 | global batch size:   144 | lm loss: 6.850246E+00 | loss scale: 8192.0 | grad norm: 139070.087 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8080/  159576 | consumed samples:       385744 | elapsed time per iteration (ms): 21831.7 | learning rate: 6.000E-05 | global batch size:   144 | lm loss: 6.848817E+00 | loss scale: 8192.0 | grad norm: 129639.082 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8090/  159576 | consumed samples:       387184 | elapsed time per iteration (ms): 21715.3 | learning rate: 6.000E-05 | global batch size:   144 | lm loss: 6.856639E+00 | loss scale: 8192.0 | grad norm: 200364.806 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8100/  159576 | consumed samples:       388624 | elapsed time per iteration (ms): 21801.4 | learning rate: 6.000E-05 | global batch size:   144 | lm loss: 6.869398E+00 | loss scale: 8192.0 | grad norm: 141893.384 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8110/  159576 | consumed samples:       390064 | elapsed time per iteration (ms): 21693.5 | learning rate: 6.000E-05 | global batch size:   144 | lm loss: 6.834469E+00 | loss scale: 8192.0 | grad norm: 133792.650 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8120/  159576 | consumed samples:       391504 | elapsed time per iteration (ms): 21798.3 | learning rate: 6.000E-05 | global batch size:   144 | lm loss: 6.845126E+00 | loss scale: 8192.0 | grad norm: 196465.435 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8130/  159576 | consumed samples:       392944 | elapsed time per iteration (ms): 21718.4 | learning rate: 6.000E-05 | global batch size:   144 | lm loss: 6.864041E+00 | loss scale: 8192.0 | grad norm: 234002.522 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8140/  159576 | consumed samples:       394384 | elapsed time per iteration (ms): 20974.7 | learning rate: 6.000E-05 | global batch size:   144 | lm loss: 6.866895E+00 | loss scale: 8192.0 | grad norm: 214792.051 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8150/  159576 | consumed samples:       395824 | elapsed time per iteration (ms): 20962.3 | learning rate: 6.000E-05 | global batch size:   144 | lm loss: 6.949483E+00 | loss scale: 4096.0 | grad norm: 129105.294 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8160/  159576 | consumed samples:       397264 | elapsed time per iteration (ms): 21839.6 | learning rate: 6.000E-05 | global batch size:   144 | lm loss: 6.982524E+00 | loss scale: 4096.0 | grad norm: 104094.455 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8170/  159576 | consumed samples:       398704 | elapsed time per iteration (ms): 21626.3 | learning rate: 6.000E-05 | global batch size:   144 | lm loss: 6.968035E+00 | loss scale: 4096.0 | grad norm: 85705.545 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8180/  159576 | consumed samples:       400144 | elapsed time per iteration (ms): 21733.4 | learning rate: 6.000E-05 | global batch size:   144 | lm loss: 6.983526E+00 | loss scale: 4096.0 | grad norm: 140563.515 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8190/  159576 | consumed samples:       401584 | elapsed time per iteration (ms): 21768.5 | learning rate: 6.000E-05 | global batch size:   144 | lm loss: 7.016048E+00 | loss scale: 4096.0 | grad norm: 72531.033 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8200/  159576 | consumed samples:       403024 | elapsed time per iteration (ms): 21929.8 | learning rate: 6.000E-05 | global batch size:   144 | lm loss: 6.996774E+00 | loss scale: 4096.0 | grad norm: 128628.095 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8210/  159576 | consumed samples:       404464 | elapsed time per iteration (ms): 21876.8 | learning rate: 6.000E-05 | global batch size:   144 | lm loss: 6.954953E+00 | loss scale: 4096.0 | grad norm: 114237.351 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-25 15:10:12] PULSE: tr8-104B is scheduled to start in 20:25:18 (at 2021-09-26T11:35:31) (1188168 on 'gpu_p13' partition)
[2021-09-25 15:10:12] PULSE: tr8-104B is waiting for the previous Job Array job to finish before scheduling a new one (1185639_[2-10%1] on 'gpu_p13' partition)
[2021-09-25 15:10:12] PULSE: tr8-104B is running for 10:43:11 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0)
 iteration     8220/  159576 | consumed samples:       405904 | elapsed time per iteration (ms): 21992.9 | learning rate: 6.000E-05 | global batch size:   144 | lm loss: 6.927856E+00 | loss scale: 4096.0 | grad norm: 191859.936 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8230/  159576 | consumed samples:       407344 | elapsed time per iteration (ms): 21845.4 | learning rate: 6.000E-05 | global batch size:   144 | lm loss: 6.915263E+00 | loss scale: 4096.0 | grad norm: 136325.623 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8240/  159576 | consumed samples:       408784 | elapsed time per iteration (ms): 21179.2 | learning rate: 6.000E-05 | global batch size:   144 | lm loss: 6.864025E+00 | loss scale: 2048.0 | grad norm: 118355.574 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8250/  159576 | consumed samples:       410224 | elapsed time per iteration (ms): 21688.2 | learning rate: 6.000E-05 | global batch size:   144 | lm loss: 6.873029E+00 | loss scale: 2048.0 | grad norm: 72612.289 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8260/  159576 | consumed samples:       411664 | elapsed time per iteration (ms): 21621.0 | learning rate: 6.000E-05 | global batch size:   144 | lm loss: 6.963725E+00 | loss scale: 2048.0 | grad norm: 77677.833 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8270/  159576 | consumed samples:       413104 | elapsed time per iteration (ms): 21832.0 | learning rate: 6.000E-05 | global batch size:   144 | lm loss: 6.939199E+00 | loss scale: 2048.0 | grad norm: 80021.251 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8280/  159576 | consumed samples:       414544 | elapsed time per iteration (ms): 21967.3 | learning rate: 6.000E-05 | global batch size:   144 | lm loss: 6.919482E+00 | loss scale: 2048.0 | grad norm: 58905.568 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8290/  159576 | consumed samples:       415984 | elapsed time per iteration (ms): 21671.6 | learning rate: 6.000E-05 | global batch size:   144 | lm loss: 6.919662E+00 | loss scale: 2048.0 | grad norm: 52571.274 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8300/  159576 | consumed samples:       417424 | elapsed time per iteration (ms): 21755.6 | learning rate: 6.000E-05 | global batch size:   144 | lm loss: 7.024297E+00 | loss scale: 2048.0 | grad norm: 77079.083 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8310/  159576 | consumed samples:       418864 | elapsed time per iteration (ms): 21909.8 | learning rate: 6.000E-05 | global batch size:   144 | lm loss: 7.234490E+00 | loss scale: 2048.0 | grad norm: 102216.544 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8320/  159576 | consumed samples:       420304 | elapsed time per iteration (ms): 21566.6 | learning rate: 6.000E-05 | global batch size:   144 | lm loss: 7.228243E+00 | loss scale: 2048.0 | grad norm: 88135.536 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8330/  159576 | consumed samples:       421744 | elapsed time per iteration (ms): 22069.0 | learning rate: 6.000E-05 | global batch size:   144 | lm loss: 7.068048E+00 | loss scale: 2048.0 | grad norm: 65341.009 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8340/  159576 | consumed samples:       423184 | elapsed time per iteration (ms): 21682.1 | learning rate: 6.000E-05 | global batch size:   144 | lm loss: 7.049673E+00 | loss scale: 2048.0 | grad norm: 45586.386 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8350/  159576 | consumed samples:       424624 | elapsed time per iteration (ms): 21918.1 | learning rate: 6.000E-05 | global batch size:   144 | lm loss: 7.033588E+00 | loss scale: 2048.0 | grad norm: 60230.392 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8360/  159576 | consumed samples:       426160 | elapsed time per iteration (ms): 22474.7 | learning rate: 6.000E-05 | global batch size:   160 | lm loss: 7.032515E+00 | loss scale: 2048.0 | grad norm: 55714.258 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8370/  159576 | consumed samples:       427760 | elapsed time per iteration (ms): 22723.0 | learning rate: 6.000E-05 | global batch size:   160 | lm loss: 7.051062E+00 | loss scale: 2048.0 | grad norm: 68784.584 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-25 16:10:22] PULSE: tr8-104B is scheduled to start in 19:16:12 (at 2021-09-26T11:26:35) (1188168 on 'gpu_p13' partition)
[2021-09-25 16:10:22] PULSE: tr8-104B is waiting for the previous Job Array job to finish before scheduling a new one (1185639_[2-10%1] on 'gpu_p13' partition)
[2021-09-25 16:10:22] PULSE: tr8-104B is running for 11:43:21 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0)
 iteration     8380/  159576 | consumed samples:       429360 | elapsed time per iteration (ms): 22974.1 | learning rate: 6.000E-05 | global batch size:   160 | lm loss: 7.025337E+00 | loss scale: 2048.0 | grad norm: 89725.468 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8390/  159576 | consumed samples:       430960 | elapsed time per iteration (ms): 22266.9 | learning rate: 6.000E-05 | global batch size:   160 | lm loss: 7.010270E+00 | loss scale: 1024.0 | grad norm: 33629.138 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8400/  159576 | consumed samples:       432560 | elapsed time per iteration (ms): 22964.2 | learning rate: 6.000E-05 | global batch size:   160 | lm loss: 7.020833E+00 | loss scale: 1024.0 | grad norm: 46812.316 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8410/  159576 | consumed samples:       434160 | elapsed time per iteration (ms): 22923.5 | learning rate: 6.000E-05 | global batch size:   160 | lm loss: 7.044554E+00 | loss scale: 1024.0 | grad norm: 55335.802 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8420/  159576 | consumed samples:       435760 | elapsed time per iteration (ms): 22690.3 | learning rate: 6.000E-05 | global batch size:   160 | lm loss: 7.074860E+00 | loss scale: 1024.0 | grad norm: 27018.225 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8430/  159576 | consumed samples:       437360 | elapsed time per iteration (ms): 22997.6 | learning rate: 6.000E-05 | global batch size:   160 | lm loss: 7.108445E+00 | loss scale: 1024.0 | grad norm: 95058.404 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8440/  159576 | consumed samples:       438960 | elapsed time per iteration (ms): 22696.4 | learning rate: 6.000E-05 | global batch size:   160 | lm loss: 7.128921E+00 | loss scale: 1024.0 | grad norm: 44470.175 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8450/  159576 | consumed samples:       440560 | elapsed time per iteration (ms): 22728.4 | learning rate: 6.000E-05 | global batch size:   160 | lm loss: 7.037349E+00 | loss scale: 1024.0 | grad norm: 32995.810 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8460/  159576 | consumed samples:       442160 | elapsed time per iteration (ms): 22856.0 | learning rate: 6.000E-05 | global batch size:   160 | lm loss: 7.064864E+00 | loss scale: 1024.0 | grad norm: 23093.772 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8470/  159576 | consumed samples:       443760 | elapsed time per iteration (ms): 22824.5 | learning rate: 6.000E-05 | global batch size:   160 | lm loss: 7.057752E+00 | loss scale: 1024.0 | grad norm: 34580.324 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8480/  159576 | consumed samples:       445360 | elapsed time per iteration (ms): 22939.9 | learning rate: 6.000E-05 | global batch size:   160 | lm loss: 7.111783E+00 | loss scale: 1024.0 | grad norm: 30415.135 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8490/  159576 | consumed samples:       446960 | elapsed time per iteration (ms): 22647.3 | learning rate: 6.000E-05 | global batch size:   160 | lm loss: 7.077787E+00 | loss scale: 1024.0 | grad norm: 44228.518 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8500/  159576 | consumed samples:       448560 | elapsed time per iteration (ms): 22870.1 | learning rate: 6.000E-05 | global batch size:   160 | lm loss: 7.017307E+00 | loss scale: 1024.0 | grad norm: 31106.331 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-25 17:00:02] PULSE: tr8-104B is scheduled to start in 18:26:32 (at 2021-09-26T11:26:35) (1188168 on 'gpu_p13' partition)
[2021-09-25 17:00:02] PULSE: tr8-104B is waiting for the previous Job Array job to finish before scheduling a new one (1185639_[2-10%1] on 'gpu_p13' partition)
[2021-09-25 17:00:02] PULSE: tr8-104B is running for 12:33:01 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0)
 iteration     8510/  159576 | consumed samples:       450160 | elapsed time per iteration (ms): 22836.1 | learning rate: 6.000E-05 | global batch size:   160 | lm loss: 7.033496E+00 | loss scale: 1024.0 | grad norm: 84589.712 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8520/  159576 | consumed samples:       451760 | elapsed time per iteration (ms): 22678.6 | learning rate: 6.000E-05 | global batch size:   160 | lm loss: 7.034415E+00 | loss scale: 1024.0 | grad norm: 45889.295 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8530/  159576 | consumed samples:       453360 | elapsed time per iteration (ms): 22820.3 | learning rate: 6.000E-05 | global batch size:   160 | lm loss: 7.022775E+00 | loss scale: 1024.0 | grad norm: 46421.613 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-25 17:10:31] PULSE: tr8-104B is scheduled to start in 18:16:03 (at 2021-09-26T11:26:35) (1188168 on 'gpu_p13' partition)
[2021-09-25 17:10:31] PULSE: tr8-104B is running for 12:43:30 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0)
 iteration     8540/  159576 | consumed samples:       454960 | elapsed time per iteration (ms): 22803.2 | learning rate: 6.000E-05 | global batch size:   160 | lm loss: 7.015056E+00 | loss scale: 1024.0 | grad norm: 49138.667 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8550/  159576 | consumed samples:       456560 | elapsed time per iteration (ms): 22969.4 | learning rate: 6.000E-05 | global batch size:   160 | lm loss: 7.037695E+00 | loss scale: 1024.0 | grad norm: 72675.159 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8560/  159576 | consumed samples:       458160 | elapsed time per iteration (ms): 22624.1 | learning rate: 6.000E-05 | global batch size:   160 | lm loss: 7.040105E+00 | loss scale: 1024.0 | grad norm: 55417.219 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8570/  159576 | consumed samples:       459760 | elapsed time per iteration (ms): 22663.1 | learning rate: 6.000E-05 | global batch size:   160 | lm loss: 7.066528E+00 | loss scale: 1024.0 | grad norm: 48492.969 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-25 17:26:58] PULSE: tr8-104B is scheduled to start in 17:59:36 (at 2021-09-26T11:26:35) (1188168 on 'gpu_p13' partition)
[2021-09-25 17:26:58] PULSE: tr8-104B is running for 12:59:57 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0)
 iteration     8580/  159576 | consumed samples:       461360 | elapsed time per iteration (ms): 22688.8 | learning rate: 6.000E-05 | global batch size:   160 | lm loss: 7.087028E+00 | loss scale: 1024.0 | grad norm: 46974.842 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8590/  159576 | consumed samples:       462960 | elapsed time per iteration (ms): 22699.4 | learning rate: 6.000E-05 | global batch size:   160 | lm loss: 7.089204E+00 | loss scale: 1024.0 | grad norm: 44702.862 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8600/  159576 | consumed samples:       464560 | elapsed time per iteration (ms): 22777.7 | learning rate: 6.000E-05 | global batch size:   160 | lm loss: 7.149306E+00 | loss scale: 1024.0 | grad norm: 261339.801 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8610/  159576 | consumed samples:       466160 | elapsed time per iteration (ms): 22975.5 | learning rate: 6.000E-05 | global batch size:   160 | lm loss: 7.167276E+00 | loss scale: 1024.0 | grad norm: 105455.551 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8620/  159576 | consumed samples:       467760 | elapsed time per iteration (ms): 23048.5 | learning rate: 6.000E-05 | global batch size:   160 | lm loss: 7.078442E+00 | loss scale: 1024.0 | grad norm: 84212.423 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8630/  159576 | consumed samples:       469360 | elapsed time per iteration (ms): 22799.5 | learning rate: 6.000E-05 | global batch size:   160 | lm loss: 7.081234E+00 | loss scale: 1024.0 | grad norm: 52121.419 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8640/  159576 | consumed samples:       470960 | elapsed time per iteration (ms): 22720.5 | learning rate: 6.000E-05 | global batch size:   160 | lm loss: 7.109283E+00 | loss scale: 1024.0 | grad norm: 48651.489 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8650/  159576 | consumed samples:       472560 | elapsed time per iteration (ms): 22695.2 | learning rate: 6.000E-05 | global batch size:   160 | lm loss: 7.118199E+00 | loss scale: 1024.0 | grad norm: 26046.891 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8660/  159576 | consumed samples:       474320 | elapsed time per iteration (ms): 23933.5 | learning rate: 6.000E-05 | global batch size:   176 | lm loss: 7.064212E+00 | loss scale: 1024.0 | grad norm: 40523.058 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8670/  159576 | consumed samples:       476080 | elapsed time per iteration (ms): 23798.1 | learning rate: 6.000E-05 | global batch size:   176 | lm loss: 7.051229E+00 | loss scale: 1024.0 | grad norm: 28160.238 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8680/  159576 | consumed samples:       477840 | elapsed time per iteration (ms): 23923.9 | learning rate: 6.000E-05 | global batch size:   176 | lm loss: 7.036906E+00 | loss scale: 1024.0 | grad norm: 51047.866 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8690/  159576 | consumed samples:       479600 | elapsed time per iteration (ms): 23651.1 | learning rate: 6.000E-05 | global batch size:   176 | lm loss: 7.073657E+00 | loss scale: 1024.0 | grad norm: 141610.865 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-25 18:10:35] PULSE: tr8-104B is scheduled to start in 17:15:59 (at 2021-09-26T11:26:35) (1188168 on 'gpu_p13' partition)
[2021-09-25 18:10:35] PULSE: tr8-104B is running for 13:43:34 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0)
 iteration     8700/  159576 | consumed samples:       481360 | elapsed time per iteration (ms): 23943.4 | learning rate: 6.000E-05 | global batch size:   176 | lm loss: 7.071510E+00 | loss scale: 1024.0 | grad norm: 24381.440 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8710/  159576 | consumed samples:       483120 | elapsed time per iteration (ms): 23910.3 | learning rate: 6.000E-05 | global batch size:   176 | lm loss: 7.190697E+00 | loss scale: 1024.0 | grad norm: 41525.807 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8720/  159576 | consumed samples:       484880 | elapsed time per iteration (ms): 23923.5 | learning rate: 6.000E-05 | global batch size:   176 | lm loss: 7.332158E+00 | loss scale: 1024.0 | grad norm: 23580.074 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8730/  159576 | consumed samples:       486640 | elapsed time per iteration (ms): 23664.9 | learning rate: 6.000E-05 | global batch size:   176 | lm loss: 7.250137E+00 | loss scale: 1024.0 | grad norm: 33934.114 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8740/  159576 | consumed samples:       488400 | elapsed time per iteration (ms): 24002.8 | learning rate: 6.000E-05 | global batch size:   176 | lm loss: 7.134158E+00 | loss scale: 1024.0 | grad norm: 18917.778 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8750/  159576 | consumed samples:       490160 | elapsed time per iteration (ms): 23812.9 | learning rate: 6.000E-05 | global batch size:   176 | lm loss: 7.133132E+00 | loss scale: 1024.0 | grad norm: 24524.875 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8760/  159576 | consumed samples:       491920 | elapsed time per iteration (ms): 24164.0 | learning rate: 6.000E-05 | global batch size:   176 | lm loss: 7.089709E+00 | loss scale: 1024.0 | grad norm: 18466.411 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8770/  159576 | consumed samples:       493680 | elapsed time per iteration (ms): 23763.0 | learning rate: 6.000E-05 | global batch size:   176 | lm loss: 7.075866E+00 | loss scale: 1024.0 | grad norm: 21160.208 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8780/  159576 | consumed samples:       495440 | elapsed time per iteration (ms): 23757.0 | learning rate: 6.000E-05 | global batch size:   176 | lm loss: 7.105405E+00 | loss scale: 1024.0 | grad norm: 21012.399 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8790/  159576 | consumed samples:       497200 | elapsed time per iteration (ms): 23726.0 | learning rate: 6.000E-05 | global batch size:   176 | lm loss: 7.119524E+00 | loss scale: 1024.0 | grad norm: 19184.310 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-25 18:51:17] PULSE: tr8-104B is scheduled to start in 19:55:07 (at 2021-09-26T14:46:25) (1188168 on 'gpu_p13' partition)
[2021-09-25 18:51:17] PULSE: tr8-104B is running for 14:24:16 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0)
 iteration     8800/  159576 | consumed samples:       498960 | elapsed time per iteration (ms): 23872.5 | learning rate: 6.000E-05 | global batch size:   176 | lm loss: 7.150304E+00 | loss scale: 1024.0 | grad norm: 20582.002 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8810/  159576 | consumed samples:       500720 | elapsed time per iteration (ms): 23674.3 | learning rate: 6.000E-05 | global batch size:   176 | lm loss: 7.121466E+00 | loss scale: 1024.0 | grad norm: 26026.638 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8820/  159576 | consumed samples:       502480 | elapsed time per iteration (ms): 23655.3 | learning rate: 6.000E-05 | global batch size:   176 | lm loss: 7.227619E+00 | loss scale: 1024.0 | grad norm: 19493.231 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8830/  159576 | consumed samples:       504240 | elapsed time per iteration (ms): 24040.7 | learning rate: 6.000E-05 | global batch size:   176 | lm loss: 7.202127E+00 | loss scale: 1024.0 | grad norm: 21130.889 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8840/  159576 | consumed samples:       506000 | elapsed time per iteration (ms): 23751.6 | learning rate: 6.000E-05 | global batch size:   176 | lm loss: 7.102602E+00 | loss scale: 1024.0 | grad norm: 15258.781 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-25 19:10:38] PULSE: tr8-104B is scheduled to start in 19:35:46 (at 2021-09-26T14:46:25) (1188168 on 'gpu_p13' partition)
[2021-09-25 19:10:38] PULSE: tr8-104B is running for 14:43:37 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0)
 iteration     8850/  159576 | consumed samples:       507760 | elapsed time per iteration (ms): 23681.3 | learning rate: 6.000E-05 | global batch size:   176 | lm loss: 7.106478E+00 | loss scale: 1024.0 | grad norm: 15650.558 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8860/  159576 | consumed samples:       509520 | elapsed time per iteration (ms): 23830.0 | learning rate: 6.000E-05 | global batch size:   176 | lm loss: 7.077826E+00 | loss scale: 1024.0 | grad norm: 13271.961 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8870/  159576 | consumed samples:       511280 | elapsed time per iteration (ms): 23830.3 | learning rate: 6.000E-05 | global batch size:   176 | lm loss: 7.083195E+00 | loss scale: 1024.0 | grad norm: 13942.816 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8880/  159576 | consumed samples:       513040 | elapsed time per iteration (ms): 23893.7 | learning rate: 6.000E-05 | global batch size:   176 | lm loss: 7.101151E+00 | loss scale: 1024.0 | grad norm: 17666.067 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8890/  159576 | consumed samples:       514800 | elapsed time per iteration (ms): 23733.4 | learning rate: 6.000E-05 | global batch size:   176 | lm loss: 7.130984E+00 | loss scale: 2048.0 | grad norm: 41179.422 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8900/  159576 | consumed samples:       516560 | elapsed time per iteration (ms): 23693.0 | learning rate: 6.000E-05 | global batch size:   176 | lm loss: 7.084023E+00 | loss scale: 2048.0 | grad norm: 32703.102 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8910/  159576 | consumed samples:       518320 | elapsed time per iteration (ms): 23793.1 | learning rate: 6.000E-05 | global batch size:   176 | lm loss: 7.094463E+00 | loss scale: 2048.0 | grad norm: 46954.552 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8920/  159576 | consumed samples:       520112 | elapsed time per iteration (ms): 23988.6 | learning rate: 6.000E-05 | global batch size:   192 | lm loss: 7.094890E+00 | loss scale: 2048.0 | grad norm: 20910.711 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8930/  159576 | consumed samples:       522032 | elapsed time per iteration (ms): 24780.5 | learning rate: 6.000E-05 | global batch size:   192 | lm loss: 7.112840E+00 | loss scale: 2048.0 | grad norm: 23723.304 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8940/  159576 | consumed samples:       523952 | elapsed time per iteration (ms): 24880.9 | learning rate: 6.000E-05 | global batch size:   192 | lm loss: 7.157214E+00 | loss scale: 2048.0 | grad norm: 35769.072 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8950/  159576 | consumed samples:       525872 | elapsed time per iteration (ms): 24820.3 | learning rate: 6.000E-05 | global batch size:   192 | lm loss: 7.212303E+00 | loss scale: 2048.0 | grad norm: 20241.796 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8960/  159576 | consumed samples:       527792 | elapsed time per iteration (ms): 24706.7 | learning rate: 6.000E-05 | global batch size:   192 | lm loss: 7.215181E+00 | loss scale: 2048.0 | grad norm: 48969.302 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8970/  159576 | consumed samples:       529712 | elapsed time per iteration (ms): 23528.3 | learning rate: 6.000E-05 | global batch size:   192 | loss scale: 1024.0 | grad norm: 156762.139 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8980/  159576 | consumed samples:       531632 | elapsed time per iteration (ms): 18302.5 | learning rate: 6.000E-05 | global batch size:   192 | loss scale: 2.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8990/  159576 | consumed samples:       533552 | elapsed time per iteration (ms): 17645.0 | learning rate: 6.000E-05 | global batch size:   192 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-25 20:10:52] PULSE: tr8-104B is scheduled to start in 18:35:32 (at 2021-09-26T14:46:25) (1188168 on 'gpu_p13' partition)
[2021-09-25 20:10:52] PULSE: tr8-104B is running for 15:43:51 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0)
 iteration     9000/  159576 | consumed samples:       535472 | elapsed time per iteration (ms): 17316.3 | learning rate: 6.000E-05 | global batch size:   192 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
------------------------------------------------------------------------------------------------
 validation loss at iteration 9000 | lm loss value: 7.256732E+00 | lm loss PPL: 1.417617E+03 | 
------------------------------------------------------------------------------------------------
saving checkpoint at iteration    9000 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints
[2021-09-25 20:11:32,719] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/global_step9000/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration    9000 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints
time (ms) | save-checkpoint: 17709.49
 iteration     9010/  159576 | consumed samples:       537392 | elapsed time per iteration (ms): 21623.6 | learning rate: 6.000E-05 | global batch size:   192 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9020/  159576 | consumed samples:       539312 | elapsed time per iteration (ms): 17559.0 | learning rate: 6.000E-05 | global batch size:   192 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9030/  159576 | consumed samples:       541232 | elapsed time per iteration (ms): 17827.7 | learning rate: 6.000E-05 | global batch size:   192 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9040/  159576 | consumed samples:       543152 | elapsed time per iteration (ms): 17458.2 | learning rate: 6.000E-05 | global batch size:   192 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9050/  159576 | consumed samples:       545072 | elapsed time per iteration (ms): 17470.7 | learning rate: 6.000E-05 | global batch size:   192 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9060/  159576 | consumed samples:       546992 | elapsed time per iteration (ms): 17813.0 | learning rate: 6.000E-05 | global batch size:   192 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9070/  159576 | consumed samples:       548912 | elapsed time per iteration (ms): 17646.8 | learning rate: 6.000E-05 | global batch size:   192 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9080/  159576 | consumed samples:       550832 | elapsed time per iteration (ms): 17634.4 | learning rate: 6.000E-05 | global batch size:   192 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9090/  159576 | consumed samples:       552752 | elapsed time per iteration (ms): 17734.2 | learning rate: 6.000E-05 | global batch size:   192 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9100/  159576 | consumed samples:       554672 | elapsed time per iteration (ms): 17470.3 | learning rate: 6.000E-05 | global batch size:   192 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9110/  159576 | consumed samples:       556592 | elapsed time per iteration (ms): 17443.8 | learning rate: 6.000E-05 | global batch size:   192 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9120/  159576 | consumed samples:       558512 | elapsed time per iteration (ms): 17456.2 | learning rate: 6.000E-05 | global batch size:   192 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9130/  159576 | consumed samples:       560432 | elapsed time per iteration (ms): 17374.7 | learning rate: 6.000E-05 | global batch size:   192 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9140/  159576 | consumed samples:       562352 | elapsed time per iteration (ms): 17541.4 | learning rate: 6.000E-05 | global batch size:   192 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9150/  159576 | consumed samples:       564272 | elapsed time per iteration (ms): 17680.4 | learning rate: 6.000E-05 | global batch size:   192 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9160/  159576 | consumed samples:       566192 | elapsed time per iteration (ms): 17412.1 | learning rate: 6.000E-05 | global batch size:   192 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9170/  159576 | consumed samples:       568208 | elapsed time per iteration (ms): 18281.1 | learning rate: 6.000E-05 | global batch size:   208 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9180/  159576 | consumed samples:       570288 | elapsed time per iteration (ms): 18627.2 | learning rate: 6.000E-05 | global batch size:   208 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9190/  159576 | consumed samples:       572368 | elapsed time per iteration (ms): 18546.6 | learning rate: 6.000E-05 | global batch size:   208 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-25 21:10:54] PULSE: tr8-104B is scheduled to start in 17:35:30 (at 2021-09-26T14:46:25) (1188168 on 'gpu_p13' partition)
[2021-09-25 21:10:54] PULSE: tr8-104B is running for 16:43:53 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0)
 iteration     9200/  159576 | consumed samples:       574448 | elapsed time per iteration (ms): 18675.7 | learning rate: 6.000E-05 | global batch size:   208 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9210/  159576 | consumed samples:       576528 | elapsed time per iteration (ms): 18679.9 | learning rate: 6.000E-05 | global batch size:   208 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9220/  159576 | consumed samples:       578608 | elapsed time per iteration (ms): 18524.7 | learning rate: 6.000E-05 | global batch size:   208 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9230/  159576 | consumed samples:       580688 | elapsed time per iteration (ms): 18762.7 | learning rate: 6.000E-05 | global batch size:   208 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9240/  159576 | consumed samples:       582768 | elapsed time per iteration (ms): 18695.7 | learning rate: 6.000E-05 | global batch size:   208 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9250/  159576 | consumed samples:       584848 | elapsed time per iteration (ms): 18780.0 | learning rate: 6.000E-05 | global batch size:   208 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9260/  159576 | consumed samples:       586928 | elapsed time per iteration (ms): 18593.2 | learning rate: 6.000E-05 | global batch size:   208 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9270/  159576 | consumed samples:       589008 | elapsed time per iteration (ms): 18476.6 | learning rate: 6.000E-05 | global batch size:   208 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9280/  159576 | consumed samples:       591088 | elapsed time per iteration (ms): 18595.2 | learning rate: 6.000E-05 | global batch size:   208 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9290/  159576 | consumed samples:       593168 | elapsed time per iteration (ms): 18498.1 | learning rate: 6.000E-05 | global batch size:   208 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9300/  159576 | consumed samples:       595248 | elapsed time per iteration (ms): 18531.6 | learning rate: 6.000E-05 | global batch size:   208 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9310/  159576 | consumed samples:       597328 | elapsed time per iteration (ms): 18538.6 | learning rate: 6.000E-05 | global batch size:   208 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9320/  159576 | consumed samples:       599408 | elapsed time per iteration (ms): 18768.3 | learning rate: 6.000E-05 | global batch size:   208 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9330/  159576 | consumed samples:       601488 | elapsed time per iteration (ms): 18445.0 | learning rate: 6.000E-05 | global batch size:   208 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9340/  159576 | consumed samples:       603568 | elapsed time per iteration (ms): 18700.8 | learning rate: 6.000E-05 | global batch size:   208 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9350/  159576 | consumed samples:       605648 | elapsed time per iteration (ms): 18716.7 | learning rate: 6.000E-05 | global batch size:   208 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9360/  159576 | consumed samples:       607728 | elapsed time per iteration (ms): 18488.0 | learning rate: 6.000E-05 | global batch size:   208 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9370/  159576 | consumed samples:       609808 | elapsed time per iteration (ms): 18621.0 | learning rate: 6.000E-05 | global batch size:   208 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9380/  159576 | consumed samples:       611888 | elapsed time per iteration (ms): 18781.4 | learning rate: 6.000E-05 | global batch size:   208 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9390/  159576 | consumed samples:       613968 | elapsed time per iteration (ms): 18582.4 | learning rate: 6.000E-05 | global batch size:   208 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-25 22:11:04] PULSE: tr8-104B is scheduled to start in 17:17:05 (at 2021-09-26T15:28:10) (1188168 on 'gpu_p13' partition)
[2021-09-25 22:11:04] PULSE: tr8-104B is running for 17:44:03 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0)
 iteration     9400/  159576 | consumed samples:       616192 | elapsed time per iteration (ms): 19918.8 | learning rate: 6.000E-05 | global batch size:   224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9410/  159576 | consumed samples:       618432 | elapsed time per iteration (ms): 19675.6 | learning rate: 6.000E-05 | global batch size:   224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9420/  159576 | consumed samples:       620672 | elapsed time per iteration (ms): 19904.3 | learning rate: 6.000E-05 | global batch size:   224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9430/  159576 | consumed samples:       622912 | elapsed time per iteration (ms): 19702.9 | learning rate: 6.000E-05 | global batch size:   224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9440/  159576 | consumed samples:       625152 | elapsed time per iteration (ms): 19798.2 | learning rate: 6.000E-05 | global batch size:   224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9450/  159576 | consumed samples:       627392 | elapsed time per iteration (ms): 19797.6 | learning rate: 6.000E-05 | global batch size:   224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9460/  159576 | consumed samples:       629632 | elapsed time per iteration (ms): 20223.0 | learning rate: 6.000E-05 | global batch size:   224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9470/  159576 | consumed samples:       631872 | elapsed time per iteration (ms): 19847.6 | learning rate: 6.000E-05 | global batch size:   224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9480/  159576 | consumed samples:       634112 | elapsed time per iteration (ms): 19783.5 | learning rate: 6.000E-05 | global batch size:   224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9490/  159576 | consumed samples:       636352 | elapsed time per iteration (ms): 19768.8 | learning rate: 6.000E-05 | global batch size:   224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9500/  159576 | consumed samples:       638592 | elapsed time per iteration (ms): 19836.7 | learning rate: 6.000E-05 | global batch size:   224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9510/  159576 | consumed samples:       640832 | elapsed time per iteration (ms): 19791.2 | learning rate: 6.000E-05 | global batch size:   224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9520/  159576 | consumed samples:       643072 | elapsed time per iteration (ms): 19677.8 | learning rate: 6.000E-05 | global batch size:   224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9530/  159576 | consumed samples:       645312 | elapsed time per iteration (ms): 19695.3 | learning rate: 6.000E-05 | global batch size:   224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9540/  159576 | consumed samples:       647552 | elapsed time per iteration (ms): 19697.0 | learning rate: 6.000E-05 | global batch size:   224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9550/  159576 | consumed samples:       649792 | elapsed time per iteration (ms): 19776.4 | learning rate: 6.000E-05 | global batch size:   224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9560/  159576 | consumed samples:       652032 | elapsed time per iteration (ms): 19726.6 | learning rate: 6.000E-05 | global batch size:   224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9570/  159576 | consumed samples:       654272 | elapsed time per iteration (ms): 19764.1 | learning rate: 6.000E-05 | global batch size:   224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-25 23:11:05] PULSE: tr8-104B is scheduled to start in 18:13:44 (at 2021-09-26T17:24:50) (1188168 on 'gpu_p13' partition)
[2021-09-25 23:11:05] PULSE: tr8-104B is running for 18:44:04 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0)
 iteration     9580/  159576 | consumed samples:       656512 | elapsed time per iteration (ms): 19889.3 | learning rate: 6.000E-05 | global batch size:   224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9590/  159576 | consumed samples:       658752 | elapsed time per iteration (ms): 19672.3 | learning rate: 6.000E-05 | global batch size:   224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9600/  159576 | consumed samples:       660992 | elapsed time per iteration (ms): 19668.0 | learning rate: 6.000E-05 | global batch size:   224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9610/  159576 | consumed samples:       663360 | elapsed time per iteration (ms): 20660.1 | learning rate: 6.000E-05 | global batch size:   240 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9620/  159576 | consumed samples:       665760 | elapsed time per iteration (ms): 20759.5 | learning rate: 6.000E-05 | global batch size:   240 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9630/  159576 | consumed samples:       668160 | elapsed time per iteration (ms): 20573.3 | learning rate: 6.000E-05 | global batch size:   240 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9640/  159576 | consumed samples:       670560 | elapsed time per iteration (ms): 21117.4 | learning rate: 6.000E-05 | global batch size:   240 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9650/  159576 | consumed samples:       672960 | elapsed time per iteration (ms): 21312.3 | learning rate: 6.000E-05 | global batch size:   240 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9660/  159576 | consumed samples:       675360 | elapsed time per iteration (ms): 20596.0 | learning rate: 6.000E-05 | global batch size:   240 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9670/  159576 | consumed samples:       677760 | elapsed time per iteration (ms): 20413.4 | learning rate: 6.000E-05 | global batch size:   240 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9680/  159576 | consumed samples:       680160 | elapsed time per iteration (ms): 20820.1 | learning rate: 6.000E-05 | global batch size:   240 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9690/  159576 | consumed samples:       682560 | elapsed time per iteration (ms): 20882.2 | learning rate: 6.000E-05 | global batch size:   240 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9700/  159576 | consumed samples:       684960 | elapsed time per iteration (ms): 21320.0 | learning rate: 6.000E-05 | global batch size:   240 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9710/  159576 | consumed samples:       687360 | elapsed time per iteration (ms): 20632.6 | learning rate: 6.000E-05 | global batch size:   240 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9720/  159576 | consumed samples:       689760 | elapsed time per iteration (ms): 20593.0 | learning rate: 6.000E-05 | global batch size:   240 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9730/  159576 | consumed samples:       692160 | elapsed time per iteration (ms): 21160.0 | learning rate: 6.000E-05 | global batch size:   240 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9740/  159576 | consumed samples:       694560 | elapsed time per iteration (ms): 20918.8 | learning rate: 6.000E-05 | global batch size:   240 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-26 00:11:13] PULSE: tr8-104B is scheduled to start in 17:13:36 (at 2021-09-26T17:24:50) (1188168 on 'gpu_p13' partition)
[2021-09-26 00:11:13] PULSE: tr8-104B is running for 19:44:12 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0)
 iteration     9750/  159576 | consumed samples:       696960 | elapsed time per iteration (ms): 20828.1 | learning rate: 6.000E-05 | global batch size:   240 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9760/  159576 | consumed samples:       699360 | elapsed time per iteration (ms): 20766.8 | learning rate: 6.000E-05 | global batch size:   240 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
saving checkpoint at iteration    9768 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints
[2021-09-26 00:17:36,090] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/global_step9768/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration    9768 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints
time (ms) | save-checkpoint: 22024.89
[exiting program after 1190.3113538821538 minutes] datetime: 2021-09-26 00:17:52 
[2021-09-26 01:11:06] PULSE: tr8-104B is scheduled to start in 18:25:25 (at 2021-09-26T19:36:32) (1188168 on 'gpu_p13' partition)
[2021-09-26 02:11:19] PULSE: tr8-104B is scheduled to start in 17:25:12 (at 2021-09-26T19:36:32) (1188168 on 'gpu_p13' partition)
[2021-09-26 03:11:35] PULSE: tr8-104B is scheduled to start in 19:51:55 (at 2021-09-26T23:03:31) (1188168 on 'gpu_p13' partition)
[2021-09-26 04:11:39] PULSE: tr8-104B is scheduled to start in 19:06:56 (at 2021-09-26T23:18:36) (1188168 on 'gpu_p13' partition)
[2021-09-26 05:11:41] PULSE: tr8-104B is scheduled to start in 18:19:12 (at 2021-09-26T23:30:54) (1188168 on 'gpu_p13' partition)
[2021-09-26 06:11:46] PULSE: tr8-104B is scheduled to start in 17:19:07 (at 2021-09-26T23:30:54) (1188168 on 'gpu_p13' partition)
[2021-09-26 07:11:59] PULSE: tr8-104B is scheduled to start in 17:27:45 (at 2021-09-27T00:39:45) (1188168 on 'gpu_p13' partition)
[2021-09-26 08:12:02] PULSE: tr8-104B is scheduled to start in 12:30:49 (at 2021-09-26T20:42:52) (1188168 on 'gpu_p13' partition)
[2021-09-26 09:12:23] PULSE: tr8-104B is scheduled to start in 11:30:28 (at 2021-09-26T20:42:52) (1188168 on 'gpu_p13' partition)
[2021-09-26 10:12:24] PULSE: tr8-104B is scheduled to start in 10:30:27 (at 2021-09-26T20:42:52) (1188168 on 'gpu_p13' partition)
[2021-09-26 11:12:28] PULSE: tr8-104B is scheduled to start in 9:30:23 (at 2021-09-26T20:42:52) (1188168 on 'gpu_p13' partition)
[2021-09-26 12:12:40] PULSE: tr8-104B is scheduled to start in 10:14:45 (at 2021-09-26T22:27:26) (1188168 on 'gpu_p13' partition)
[2021-09-26 13:12:49] PULSE: tr8-104B is scheduled to start in 9:14:36 (at 2021-09-26T22:27:26) (1188168 on 'gpu_p13' partition)
[2021-09-26 14:12:56] PULSE: tr8-104B is scheduled to start in 8:33:42 (at 2021-09-26T22:46:39) (1188168 on 'gpu_p13' partition)
[2021-09-26 15:13:22] PULSE: tr8-104B is scheduled to start in 7:16:41 (at 2021-09-26T22:30:04) (1188168 on 'gpu_p13' partition)
[2021-09-26 16:13:24] PULSE: tr8-104B is scheduled to start in 6:16:39 (at 2021-09-26T22:30:04) (1188168 on 'gpu_p13' partition)
[2021-09-26 17:13:32] PULSE: tr8-104B is scheduled to start in 5:16:31 (at 2021-09-26T22:30:04) (1188168 on 'gpu_p13' partition)
[2021-09-26 18:13:29] PULSE: tr8-104B is scheduled to start in 9:13:25 (at 2021-09-27T03:26:55) (1188168 on 'gpu_p13' partition)
[2021-09-26 19:13:42] PULSE: tr8-104B is scheduled to start in 12:06:13 (at 2021-09-27T07:19:56) (1188168 on 'gpu_p13' partition)
[2021-09-26 20:13:45] PULSE: tr8-104B is scheduled to start in 11:06:10 (at 2021-09-27T07:19:56) (1188168 on 'gpu_p13' partition)
[2021-09-26 21:14:04] PULSE: tr8-104B is scheduled to start in 18:20:04 (at 2021-09-27T15:34:09) (1188168 on 'gpu_p13' partition)
[2021-09-26 22:14:04] PULSE: tr8-104B is scheduled to start in 17:20:04 (at 2021-09-27T15:34:09) (1188168 on 'gpu_p13' partition)
[2021-09-26 23:14:12] PULSE: tr8-104B is scheduled to start in 16:36:40 (at 2021-09-27T15:50:53) (1188168 on 'gpu_p13' partition)
[2021-09-27 00:14:11] PULSE: tr8-104B is scheduled to start in 15:32:33 (at 2021-09-27T15:46:45) (1188168 on 'gpu_p13' partition)
[2021-09-27 01:14:15] PULSE: tr8-104B is scheduled to start in 14:32:29 (at 2021-09-27T15:46:45) (1188168 on 'gpu_p13' partition)
[2021-09-27 02:14:18] PULSE: tr8-104B is scheduled to start in 10:17:12 (at 2021-09-27T12:31:31) (1188168 on 'gpu_p13' partition)
[2021-09-27 03:14:23] PULSE: tr8-104B is scheduled to start in 9:17:07 (at 2021-09-27T12:31:31) (1188168 on 'gpu_p13' partition)
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
op name ................ installed .. compatible
--------------------------------------------------
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
ninja .................. [92m[OKAY][0m
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
op name ................ installed .. compatible
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
op name ................ installed .. compatible
--------------------------------------------------
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
op name ................ installed .. compatible
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
op name ................ installed .. compatible
--------------------------------------------------
--------------------------------------------------
op name ................ installed .. compatible
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
op name ................ installed .. compatible
--------------------------------------------------
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
--------------------------------------------------
JIT compiled ops requires ninja
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
JIT compiled ops requires ninja
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
ninjastochastic_transformer  ...................  [92m[OKAY][0m[93m[NO][0m
 .......-------------------------------------------------- 
[92m[OKAY][0m
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
op name ................ installed .. compatible
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
JIT compiled ops requires ninja
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
op name ................ installed .. compatible
--------------------------------------------------
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
JIT compiled ops requires ninja
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
ninja .................. [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
op name ................ installed .. compatible
--------------------------------------------------
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
op name ................ installed .. compatible
--------------------------------------------------
op name ................ installed .. compatible
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
op name ................ installed .. compatible
--------------------------------------------------
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
op name ................ installed .. compatible
--------------------------------------------------
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
ninja .................. [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
op name ................ installed .. compatible
--------------------------------------------------
--------------------------------------------------
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
op name ................ installed .. compatible
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
op name ................ installed .. compatible
--------------------------------------------------
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
op name ................ installed .. compatible
--------------------------------------------------
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
ninja .................. [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
JIT compiled ops requires ninja
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
ninja .................. [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
op name ................ installed .. compatible
JIT compiled ops requires ninja
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
op name ................ installed .. compatible
--------------------------------------------------
ninja .................. [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
op name ................ installed .. compatible
--------------------------------------------------
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
op name ................ installed .. compatible
--------------------------------------------------
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
op name ................ installed .. compatible
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
--------------------------------------------------
async_io ............... [93m[NO][0m ....... [93m[NO][0m
op name ................ installed .. compatible
--------------------------------------------------
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
JIT compiled ops requires ninja
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
ninja .................. [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
op name ................ installed .. compatible
--------------------------------------------------
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
ninja .................. [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
op name ................ installed .. compatible
--------------------------------------------------
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
op name ................ installed .. compatible
--------------------------------------------------
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
op name ................ installed .. compatible
--------------------------------------------------
ninja .................. [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
op name ................ installed .. compatible
--------------------------------------------------
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
op name ................ installed .. compatible
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
op name ................ installed .. compatible
--------------------------------------------------
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
--------------------------------------------------
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
JIT compiled ops requires ninja
op name ................ installed .. compatible
--------------------------------------------------
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
op name ................ installed .. compatible
--------------------------------------------------
--------------------------------------------------
op name ................ installed .. compatible
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
JIT compiled ops requires ninja
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
JIT compiled ops requires ninja
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
op name ................ installed .. compatible
--------------------------------------------------
async_io ............... [93m[NO][0m ....... [93m[NO][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
op name ................ installed .. compatible
--------------------------------------------------
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
op name ................ installed .. compatible
--------------------------------------------------
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
op name ................ installed .. compatible
--------------------------------------------------
ninjaninja  ....................................  [92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
--------------------------------------------------op name 
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
................ op nameinstalled  ..................  installedcompatible
 ..-------------------------------------------------- 
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
compatible
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m cpu_adam......  [92m[OKAY][0m...............
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
 [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m .......fused_adam [92m[OKAY][0m 
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
............. [93m[NO][0mfused_lamb  ....................  [93m[NO][0m[92m[OKAY][0m 
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn transformer............  ............[93m[NO][0m [93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

stochastic_transformertransformer  .............  [93m[NO][0m[93m[NO][0m .......  .......[92m[OKAY][0m 
[92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
--------------------------------------------------
async_io ............... [93m[NO][0m ....... [93m[NO][0m
op name ................ installed .. compatible
--------------------------------------------------
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
op name ................ installed .. compatible
--------------------------------------------------
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
--------------------------------------------------
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
JIT compiled ops requires ninja
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
ninja .................. [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
--------------------------------------------------
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
op name ................ installed .. compatible
--------------------------------------------------
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
fused_adam-------------------------------------------------- 
.............JIT compiled ops requires ninja 
[93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
ninja .................. [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
op name ................ installed .. compatible
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
op name ................ installed .. compatible
--------------------------------------------------
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
--------------------------------------------------
async_io ............... [93m[NO][0m ....... [93m[NO][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
JIT compiled ops requires ninja
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
op name ................ installed .. compatible
--------------------------------------------------
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
op name ................ installed .. compatible
--------------------------------------------------
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
op name ................ installed .. compatible
--------------------------------------------------
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
op name ................ installed .. compatible
--------------------------------------------------
--------------------------------------------------
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
op name ................ installed .. compatible
--------------------------------------------------
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
op name ................ installed .. compatible
--------------------------------------------------
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
--------------------------------------------------
async_io ............... [93m[NO][0m ....... [93m[NO][0m
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference utils..  ..................[93m[NO][0m  [92m[YES][0m.......  ......[92m[OKAY][0m 
[92m[OKAY][0m
quantizer utils..............  ..................[93m[NO][0m  [92m[YES][0m.......  ......[92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
op name ................ installed .. compatible
--------------------------------------------------
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
op name ................ installed .. compatible
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
--------------------------------------------------
async_io ............... [93m[NO][0m ....... [93m[NO][0m
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
JIT compiled ops requires ninja
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
--------------------------------------------------
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
--------------------------------------------------
JIT compiled ops requires ninja
async_io ............... [93m[NO][0m ....... [93m[NO][0m
ninja .................. [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference transformer_inference..  ..[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils ..................utils  [92m[YES][0m..................  ......[92m[YES][0m  [92m[OKAY][0m......
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
 [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0mquantizer  .....................  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
--------------------------------------------------
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
--------------------------------------------------
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... utils[93m[NO][0m  .........................  [92m[YES][0m[93m[NO][0m 
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... async_io[93m[NO][0m
 ............... async_io[93m[NO][0m  ...................... [93m[NO][0m 
...... [92m[OKAY][0m
[93m[NO][0m ....... transformer_inference[93m[NO][0m 
.. [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m transformer_inference.......  ..[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m
transformer_inference transformer_inferenceutils..   ....................[93m[NO][0m   [93m[NO][0m[92m[YES][0m.......   .............[92m[OKAY][0m  
[92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
quantizer utils..............  utils..................[93m[NO][0m   [92m[YES][0m.........................  ...... [92m[OKAY][0m [92m[YES][0m
[92m[OKAY][0m 
...... [92m[OKAY][0m
--------------------------------------------------
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer ..............quantizer  [93m[NO][0m..............  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
--------------------------------------------------
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io async_io...............  ...............[93m[NO][0m  .......[93m[NO][0m  [93m[NO][0m.......
 [93m[NO][0m
transformer_inference .. [93m[NO][0m transformer_inference.......  ..[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m utils......  ..................[92m[OKAY][0m 
[92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0mquantizer  .....................  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io async_io...............  ...............[93m[NO][0m  .......[93m[NO][0m  [93m[NO][0m.......
 [93m[NO][0m
transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utils ..................utils  [92m[YES][0m..................  ......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
quantizer quantizer..............  ..............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
----------------------------------------------------------------------------------------------------

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference transformer_inference..  ..[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

utils .................. [92m[YES][0m ...... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m

--------------------------------------------------
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
transformer_inference ..utils  [93m[NO][0m..................  .......[92m[YES][0m  ......[92m[OKAY][0m 
[92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
quantizer utils..............  ..................[93m[NO][0m  [92m[YES][0m.......  ......[92m[OKAY][0m 
[92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
--------------------------------------------------
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninjaninja  ....................................  [92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
--------------------------------------------------
op name op name................  installed................ ..  installedcompatible 
..-------------------------------------------------- 
compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m cpu_adam......  ...............[92m[OKAY][0m 
[92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0mfused_adam .......  .............[92m[OKAY][0m 
[93m[NO][0m fused_lamb.......  .............[92m[OKAY][0m 
[93m[NO][0m ....... fused_lamb[92m[OKAY][0m 
............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m .......sparse_attn  [92m[OKAY][0m
............ [93m[NO][0mtransformer  ...................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
transformer ............ stochastic_transformer[93m[NO][0m  ........  [92m[OKAY][0m[93m[NO][0m 
ninja .................. [92m[OKAY][0m
....... [92m[OKAY][0m
--------------------------------------------------
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m .......async_io [93m[NO][0m 
............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m .......transformer_inference  [92m[OKAY][0m..
 [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ......utils  [92m[OKAY][0m..................
 [92m[YES][0m ...... quantizer[92m[OKAY][0m 
.............. [93m[NO][0m .......quantizer  [92m[OKAY][0m..............
 [93m[NO][0m .......-------------------------------------------------- 
[92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
ninja .................. [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
op name ................ installed .. compatible
--------------------------------------------------
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
DeepSpeed general environment info:
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
JIT compiled ops requires ninja
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
async_io ............... [93m[NO][0m ....... [93m[NO][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
torch version .................... 1.8.1
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
torch cuda version ............... 11.1
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
nvcc version ..................... 11.2
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
--------------------------------------------------
--------------------------------------------------
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
--------------------------------------------------
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
ninja .................. [92m[OKAY][0m
torch version .................... 1.8.1
--------------------------------------------------
torch cuda version ............... 11.1
nvcc version ..................... 11.2
op name ................ installed .. compatible
--------------------------------------------------
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
DeepSpeed general environment info:
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
torch cuda version ............... 11.1
nvcc version ..................... 11.2
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
async_io ............... [93m[NO][0m ....... [93m[NO][0m
DeepSpeed general environment info:
torch version .................... 1.8.1
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch cuda version ............... 11.1
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
torch version .................... 1.8.1
nvcc version ..................... 11.2
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
torch cuda version ............... 11.1
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
--------------------------------------------------
nvcc version ..................... 11.2
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
DeepSpeed general environment info:
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
torch version .................... 1.8.1
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
torch cuda version ............... 11.1
--------------------------------------------------
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
--------------------------------------------------
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
JIT compiled ops requires ninja
async_io ............... [93m[NO][0m ....... [93m[NO][0m
op name ................ installed .. compatible
--------------------------------------------------
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inference .. transformer_inference[93m[NO][0m  .........  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
utils .................. utils[92m[YES][0m  ........................  [92m[YES][0m[92m[OKAY][0m 
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

...... [92m[OKAY][0mquantizer
async_io ...............async_io  [93m[NO][0m...............  .......[93m[NO][0m  [93m[NO][0m.......
 .............. [93m[NO][0m quantizer.......  ..............[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m--------------------------------------------------

--------------------------------------------------
 [93m[NO][0m
transformer_inference .. transformer_inference[93m[NO][0m  .........  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
utils .................. [92m[YES][0mutils  ........................  [92m[OKAY][0m[92m[YES][0m
 ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0mquantizer  .....................  [92m[OKAY][0m[93m[NO][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
 ....... [92m[OKAY][0m
--------------------------------------------------
async_io ............... [93m[NO][0m ....... [93m[NO][0m
--------------------------------------------------
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
--------------------------------------------------
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
torch version .................... 1.8.1
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
torch cuda version ............... 11.1
nvcc version ..................... 11.2
ninja .................. [92m[OKAY][0m
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
--------------------------------------------------
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
op name ................ installed .. compatible
--------------------------------------------------
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
op name ................ installed .. compatible
--------------------------------------------------
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
op name ................ installed .. compatible
--------------------------------------------------
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
ninja .................. [92m[OKAY][0m
torch cuda version ............... 11.1
nvcc version ..................... 11.2
--------------------------------------------------
op name ................ installed .. compatible
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
--------------------------------------------------
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
ninja .................. [92m[OKAY][0m
--------------------------------------------------
async_io ............... [93m[NO][0m ....... [93m[NO][0m
op name ................ installed .. compatible
--------------------------------------------------
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m[NO][0m

transformer_inference .. [93m[NO][0m async_io.......  [92m[OKAY][0m...............
 [93m[NO][0m ....... [93m[NO][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
transformer_inference quantizer..  ..............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninjaninja ..................  [92m[OKAY][0m..................
 [92m[OKAY][0m--------------------------------------------------

--------------------------------------------------op name
 ................ op nameinstalled  ..................  compatibleinstalled
 --------------------------------------------------..
 compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m cpu_adam......  [92m[OKAY][0m...............
 [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0mfused_adam
 ............. [93m[NO][0m fused_lamb.......  .............[92m[OKAY][0m 
[93m[NO][0m ....... fused_lamb[92m[OKAY][0m 
............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn transformer............ ............  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

transformerstochastic_transformer  ............ .[93m[NO][0m  [93m[NO][0m.......  ....... [92m[OKAY][0m[92m[OKAY][0m

stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed general environment info:
DeepSpeed general environment info:torch install path 
............... torch install path ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']...............
 torch version .................... 1.8.1['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch cuda version ...............torch version  11.1....................
 nvcc version1.8.1 
..................... 11.2torch cuda version
 deepspeed install path...............  ...........11.1 
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']nvcc version
 .....................deepspeed info  11.2...................
 deepspeed install path0.4.2+bc17042, bc17042, big-science 
...........deepspeed wheel compiled w.  ......['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
torch 1.8, cuda 11.1deepspeed info
 ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io ...............  ...............[93m[NO][0m  [93m[NO][0m.......  .......[93m[NO][0m 
[93m[NO][0m
transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

DeepSpeed general environment info:
quantizer quantizer..............  ..............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
--------------------------------------------------
--------------------------------------------------
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

nvcc version ..................... 11.2
async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference ..transformer_inference  [93m[NO][0m..  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

ninja .................. [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer quantizer..............  ..............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
----------------------------------------------------------------------------------------------------

--------------------------------------------------
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
op name ................ installed .. compatible
--------------------------------------------------
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
DeepSpeed general environment info:
--------------------------------------------------
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
op name ................ installed .. compatible
--------------------------------------------------
torch version .................... 1.8.1
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
torch cuda version ............... 11.1
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
nvcc version ..................... 11.2
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
JIT compiled ops requires ninja
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
DeepSpeed general environment info:
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
--------------------------------------------------
async_io ............... [93m[NO][0m ....... async_io[93m[NO][0m
 ............... [93m[NO][0m ....... [93m[NO][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
transformer_inference .. [93m[NO][0m ....... transformer_inference[92m[OKAY][0m 
async_io ............... [93m[NO][0m ....... [93m[NO][0m
.. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ......utils  [92m[OKAY][0m..................
 [92m[YES][0m ...... quantizer[92m[OKAY][0m 
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
.............. [93m[NO][0m .......quantizer  [92m[OKAY][0m..............
 [93m[NO][0m ....... --------------------------------------------------[92m[OKAY][0m

quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
DeepSpeed general environment info:
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
--------------------------------------------------
torch version .................... 1.8.1
torch cuda version ............... 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
nvcc version ..................... 11.2
async_io ............... [93m[NO][0m ....... [93m[NO][0m
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
op name ................ installed .. compatible
--------------------------------------------------
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:DeepSpeed general environment info:DeepSpeed general environment info:


/bin/sh: line 0: type: git: not found
torch install pathtorch install pathtorch install path   .............................................   ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']


torch versiontorch versiontorch version   ............................................................   1.8.11.8.11.8.1


/bin/sh: line 0: type: git: not found
torch cuda versiontorch cuda versiontorch cuda version   .............................................   11.111.111.1


nvcc versionnvcc versionnvcc version   ...............................................................   11.211.211.2


/bin/sh: line 0: type: git: not found
deepspeed install pathdeepspeed install path deepspeed install path ........... ........... ...........  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info
deepspeed info  deepspeed info...................  ...................0.4.2+bc17042, bc17042, big-science................... 
 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w.

 deepspeed wheel compiled w.deepspeed wheel compiled w.......   ............torch 1.8, cuda 11.1  
torch 1.8, cuda 11.1torch 1.8, cuda 11.1

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. utils[93m[NO][0m  .........................  [92m[YES][0m[92m[OKAY][0m 
...... [92m[OKAY][0m
utils quantizer..................  ..............[92m[YES][0m  [93m[NO][0m......  .......[92m[OKAY][0m 
[92m[OKAY][0m
quantizer-------------------------------------------------- 
.............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
nvcc version ..................... 11.2
torch version .................... 1.8.1
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
torch cuda version ............... 11.1
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.


async_io ............... [93m[NO][0m async_ioasync_io.......   [93m[NO][0m..............................
  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inferencetransformer_inference  .. ..[93m[NO][0m utils [93m[NO][0m ....... .................. ....... [92m[OKAY][0m [92m[YES][0m
[92m[OKAY][0m 
...... [92m[OKAY][0m
utils utils..................quantizer   ..................[92m[YES][0m..............   [92m[YES][0m......[93m[NO][0m   ......[92m[OKAY][0m....... 
 [92m[OKAY][0m[92m[OKAY][0m

quantizerquantizer  --------------------------------------------------............................
  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io async_io...............  ...............[93m[NO][0m  [93m[NO][0m ..............  [93m[NO][0m[93m[NO][0m

transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
utils .................. utils[92m[YES][0m  ........................  [92m[YES][0m[92m[OKAY][0m
 ...... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
quantizer .............. quantizer[93m[NO][0m  .....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
/bin/sh: line 0: type: git: not found
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
--------------------------------------------------
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
DeepSpeed general environment info:
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
torch cuda version ............... 11.1
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
DeepSpeed general environment info:
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
--------------------------------------------------
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
op name ................ installed .. compatible
--------------------------------------------------
async_io ............... [93m[NO][0m ....... [93m[NO][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
DeepSpeed general environment info:
--------------------------------------------------
JIT compiled ops requires ninja
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
nvcc version ..................... 11.2
async_io ............... [93m[NO][0m ....... [93m[NO][0m
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
--------------------------------------------------
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
/bin/sh: line 0: type: git: not found
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
/bin/sh: line 0: type: git: not found
torch version .................... 1.8.1
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
/bin/sh: line 0: type: git: not found
torch cuda version ............... 11.1
nvcc version ..................... 11.2
/bin/sh: line 0: type: git: not found
--------------------------------------------------
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
torch cuda version ............... 11.1
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
JIT compiled ops requires ninja
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch version .................... 1.8.1
torch cuda version ............... 11.1
DeepSpeed general environment info:
torch cuda version ............... 11.1
nvcc version ..................... 11.2
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
torch version .................... 1.8.1
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
torch cuda version ............... 11.1
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m
[92m[OKAY][0m
utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

quantizer ..............quantizer  [93m[NO][0m .....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
ninja .................. [92m[OKAY][0m
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
op name ................ installed .. compatible
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
--------------------------------------------------
torch version .................... 1.8.1
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
torch cuda version ............... 11.1
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
nvcc version ..................... 11.2
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
/bin/sh: line 0: type: git: not found
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
/bin/sh: line 0: type: git: not found
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
/bin/sh: line 0: type: git: not found
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
/bin/sh: line 0: type: git: not found
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
/bin/sh: line 0: type: git: not found
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
--------------------------------------------------
JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
ninja .................. [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
--------------------------------------------------
/bin/sh: line 0: type: git: not found
op name ................ installed .. compatible
--------------------------------------------------
/bin/sh: line 0: type: git: not found
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
ninja .................. [92m[OKAY][0mcpu_adam
 ...............-------------------------------------------------- 
[92m[YES][0m ......op name  ................[92m[OKAY][0m 
installed .. compatible
--------------------------------------------------
fused_adam ............. [93m[NO][0m cpu_adam.......  ...............[92m[OKAY][0m 
[92m[YES][0m ...... fused_lamb[92m[OKAY][0m 
............. [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn fused_lamb............  .............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer sparse_attn .............  [93m[NO][0m[93m[NO][0m  ....... .......[92m[OKAY][0m 
[92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
ninjafused_adam  ...............................  [92m[OKAY][0m[93m[NO][0m
 --------------------------------------------------.......
 op name[92m[OKAY][0m 
................ installed fused_lamb..  compatible.............
 --------------------------------------------------[93m[NO][0m
 ....... [92m[OKAY][0m
DeepSpeed general environment info:
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
ninja .................. [92m[OKAY][0m
--------------------------------------------------
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
op name ................ installed .. compatible
fused_adam .............transformer [93m[NO][0m  ...................  [93m[NO][0m[92m[OKAY][0m 
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
....... [92m[OKAY][0mfused_lamb
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
 ............. [93m[NO][0m ....... stochastic_transformer[92m[OKAY][0m 
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch cuda version ............... 11.1
nvcc version ..................... 11.2
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
/bin/sh: line 0: type: git: not found
torch version .................... 1.8.1
/bin/sh: line 0: type: git: not found
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
/bin/sh: line 0: type: git: not found
torch cuda version ............... 11.1
/bin/sh: line 0: type: git: not found
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
nvcc version ..................... 11.2
/bin/sh: line 0: type: git: not found
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
/bin/sh: line 0: type: git: not found
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
ninja .................. [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
--------------------------------------------------
/bin/sh: line 0: type: git: not found
op name ................ installed .. compatible
--------------------------------------------------
/bin/sh: line 0: type: git: not found
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
DeepSpeed general environment info:
nvcc version ..................... 11.2
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
/bin/sh: line 0: type: git: not found
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
torch cuda version ............... 11.1
DeepSpeed general environment info:torch install path 
............... torch install path ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']...............
nvcc version ..................... 11.2
/bin/sh: line 0: type: git: not found
 torch version .................... 1.8.1
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
/bin/sh: line 0: type: git: not found
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
torch cuda version torch version...............  ....................11.1 
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
1.8.1
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
nvcc version .....................torch cuda version  11.2...............
 deepspeed install path11.1 
...........nvcc version  .....................['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
11.2deepspeed info
 deepspeed install path...................  ...........0.4.2+bc17042, bc17042, big-science 
deepspeed wheel compiled w.['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
......deepspeed info  torch 1.8, cuda 11.1...................
 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... DeepSpeed general environment info:['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed info ................... torch install path0.4.2+bc17042, bc17042, big-science 
...............deepspeed wheel compiled w.  ...... torch 1.8, cuda 11.1
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
DeepSpeed general environment info:
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
/bin/sh: line 0: type: git: not found
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
/bin/sh: line 0: type: git: not found
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
/bin/sh: line 0: type: git: not found
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
/bin/sh: line 0: type: git: not found
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
/bin/sh: line 0: type: git: not found
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
/bin/sh: line 0: type: git: not found
torch version .................... 1.8.1
/bin/sh: line 0: type: git: not found
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
/bin/sh: line 0: type: git: not found
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
torch cuda version ............... 11.1
/bin/sh: line 0: type: git: not found
torch cuda version ............... 11.1
async_io ............... [93m[NO][0m ....... [93m[NO][0m
nvcc version ..................... 11.2
nvcc version ..................... 11.2
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']DeepSpeed general environment info:

DeepSpeed general environment info:
torch version .................... 1.8.1
torch install path torch cuda version...............  ............... 11.1
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
nvcc version .....................['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 
11.2
deepspeed install pathtorch version  ...............................  1.8.1['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

torch version .................... 1.8.1
torch cuda version ............... 11.1
deepspeed infotorch cuda version  ..................................  0.4.2+bc17042, bc17042, big-science11.1

deepspeed wheel compiled w.nvcc version  ...........................  torch 1.8, cuda 11.111.2
nvcc version ..................... 11.2

deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
/bin/sh: line 0: type: git: not found
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
DeepSpeed general environment info:
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
/bin/sh: line 0: type: git: not found
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
/bin/sh: line 0: type: git: not found
torch cuda version ............... 11.1
nvcc version ..................... 11.2
/bin/sh: line 0: type: git: not found
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
/bin/sh: line 0: type: git: not found
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
/bin/sh: line 0: type: git: not found
torch version .................... 1.8.1
/bin/sh: line 0: type: git: not found
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch cuda version ............... 11.1
torch version .................... 1.8.1
/bin/sh: line 0: type: git: not found
nvcc version ..................... 11.2
torch cuda version ............... 11.1
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
DeepSpeed general environment info:
nvcc version ..................... 11.2
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
torch cuda version ............... 11.1
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
nvcc version ..................... 11.2
/bin/sh: line 0: type: git: not found
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
DeepSpeed general environment info:
DeepSpeed general environment info:
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
async_io ............... [93m[NO][0m ....... [93m[NO][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
torch version .................... 1.8.1
torch cuda version ............... 11.1
torch version .................... 1.8.1
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
nvcc version ..................... 11.2
torch cuda version ............... 11.1
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
nvcc version ..................... 11.2
--------------------------------------------------
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
/bin/sh: line 0: type: git: not found
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
/bin/sh: line 0: type: git: not found
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
/bin/sh: line 0: type: git: not found
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
torch version .................... 1.8.1
/bin/sh: line 0: type: git: not found
torch cuda version ............... 11.1
/bin/sh: line 0: type: git: not found
nvcc version ..................... 11.2
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
/bin/sh: line 0: type: git: not found
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
/bin/sh: line 0: type: git: not found
async_io ............... [93m[NO][0m ....... [93m[NO][0m
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
/bin/sh: line 0: type: git: not found
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
torch cuda version ............... 11.1
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
nvcc version ..................... 11.2
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
--------------------------------------------------
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
using world size: 512, data-parallel-size: 16, tensor-model-parallel size: 4, pipeline-model-parallel size: 8 
/bin/sh: line 0: type: git: not found
using torch.float16 for parameters ...
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
/bin/sh: line 0: type: git: not found
------------------------ arguments ------------------------
/bin/sh: line 0: type: git: not found
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
/bin/sh: line 0: type: git: not found
torch version .................... 1.8.1
/bin/sh: line 0: type: git: not found
torch cuda version ............... 11.1
/bin/sh: line 0: type: git: not found
nvcc version ..................... 11.2
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
/bin/sh: line 0: type: git: not found
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
/bin/sh: line 0: type: git: not found
  accumulate_allreduce_grads_in_fp32 .............. False
/bin/sh: line 0: type: git: not found
  adam_beta1 ...................................... 0.9
  adam_beta2 ...................................... 0.999
  adam_eps ........................................ 1e-08
  adlr_autoresume ................................. False
/bin/sh: line 0: type: git: not found
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
/bin/sh: line 0: type: git: not found
  adlr_autoresume_interval ........................ 1000
  apply_query_key_layer_scaling ................... True
  apply_residual_connection_post_layernorm ........ False
  attention_dropout ............................... 0.1
  attention_softmax_in_fp32 ....................... False
/bin/sh: line 0: type: git: not found
torch cuda version ............... 11.1
nvcc version ..................... 11.2
/bin/sh: line 0: type: git: not found
  bert_binary_head ................................ True
  bert_load ....................................... None
  bf16 ............................................ False
/bin/sh: line 0: type: git: not found
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
/bin/sh: line 0: type: git: not found
  bias_dropout_fusion ............................. True
  bias_gelu_fusion ................................ True
  biencoder_projection_dim ........................ 0
  biencoder_shared_query_context_model ............ False
  block_data_path ................................. None
  checkpoint_activations .......................... True
  checkpoint_in_cpu ............................... False
  checkpoint_num_layers ........................... 1
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
  clip_grad ....................................... 1.0
  codecarbon_dir .................................. /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/tr8-104B-logs/codecarbon
  consumed_train_samples .......................... 0
  consumed_valid_samples .......................... 0
  contigious_checkpointing ........................ False
  cpu_optimizer ................................... False
  cpu_torch_adam .................................. False
  data_impl ....................................... mmap
  data_parallel_size .............................. 16
  data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document']
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
  dataloader_type ................................. single
  DDP_impl ........................................ local
  decoder_seq_length .............................. None
  deepscale ....................................... False
  deepscale_config ................................ None
  deepspeed ....................................... True
  deepspeed_activation_checkpointing .............. True
  deepspeed_config ................................ ./ds_config.1188168.json
  deepspeed_mpi ................................... False
  distribute_checkpointed_activations ............. False
  distributed_backend ............................. nccl
  embedding_path .................................. None
  encoder_seq_length .............................. 2048
  eod_mask_loss ................................... False
  eval_interval ................................... 1000
  eval_iters ...................................... 5
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
  evidence_data_path .............................. None
  exit_duration_in_mins ........................... 1190
  exit_interval ................................... None
  ffn_hidden_size ................................. 20480
  finetune ........................................ False
  fp16 ............................................ True
torch version .................... 1.8.1
  fp16_lm_cross_entropy ........................... False
  fp32_residual_connection ........................ False
  global_batch_size ............................... 2048
  hidden_dropout .................................. 0.1
  hidden_size ..................................... 16384
torch cuda version ............... 11.1
nvcc version ..................... 11.2
  hysteresis ...................................... 2
  ict_head_size ................................... None
  ict_load ........................................ None
  img_dim ......................................... 224
  indexer_batch_size .............................. 128
  indexer_log_interval ............................ 1000
  init_method_std ................................. 0.02
  init_method_xavier_uniform ...................... False
  initial_loss_scale .............................. 4294967296
  kv_channels ..................................... 512
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
  layernorm_epsilon ............................... 1e-05
  lazy_mpu_init ................................... None
  load ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints
  local_rank ...................................... 0
  log_batch_size_to_tensorboard ................... True
  log_interval .................................... 10
/bin/sh: line 0: type: git: not found
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
  log_learning_rate_to_tensorboard ................ True
  log_loss_scale_to_tensorboard ................... True
  log_num_zeros_in_grad ........................... False
  log_params_norm ................................. False
  log_timers_to_tensorboard ....................... True
  log_validation_ppl_to_tensorboard ............... True
  loss_scale ...................................... 12.0
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
  loss_scale_window ............................... 1000
  lr .............................................. 6e-05
  lr_decay_iters .................................. None
  lr_decay_samples ................................ 126953125
  lr_decay_style .................................. cosine
  lr_warmup_fraction .............................. None
  lr_warmup_iters ................................. 0
  lr_warmup_samples ............................... 216320
  make_vocab_size_divisible_by .................... 128
  mask_prob ....................................... 0.15
  masked_softmax_fusion ........................... True
  max_position_embeddings ......................... 2048
  memory_centric_tiled_linear ..................... False
  merge_file ...................................... /gpfswork/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/data/gpt2-merges.txt
  micro_batch_size ................................ 1
  min_loss_scale .................................. 1.0
  min_lr .......................................... 6e-06
  mmap_warmup ..................................... False
  no_load_optim ................................... None
  no_load_rng ..................................... None
  no_save_optim ................................... None
  no_save_rng ..................................... None
/bin/sh: line 0: type: git: not found
  num_attention_heads ............................. 32
  num_channels .................................... 3
  num_classes ..................................... 1000
  num_layers ...................................... 32
  num_layers_per_virtual_pipeline_stage ........... None
  num_workers ..................................... 2
  onnx_safe ....................................... None
/bin/sh: line 0: type: git: not found
  openai_gelu ..................................... False
  optimizer ....................................... adam
  override_lr_scheduler ........................... False
  params_dtype .................................... torch.float16
  partition_activations ........................... False
/bin/sh: line 0: type: git: not found
  patch_dim ....................................... 16
  pipeline_model_parallel_size .................... 8
  position_embedding_type ......................... PositionEmbeddingType.absolute
  profile_backward ................................ False
  query_in_block_prob ............................. 0.1
  rampup_batch_size ............................... ['16', '16', '6_000_000']
/bin/sh: line 0: type: git: not found
  rank ............................................ 0
  remote_device ................................... none
  reset_attention_mask ............................ False
  reset_position_ids .............................. False
  retriever_report_topk_accuracies ................ []
  retriever_score_scaling ......................... False
  retriever_seq_length ............................ 256
/bin/sh: line 0: type: git: not found
  sample_rate ..................................... 1.0
  save ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints
  save_interval ................................... 1500
  scatter_gather_tensors_in_pipeline .............. True
  scattered_embeddings ............................ False
/bin/sh: line 0: type: git: not found
  seed ............................................ 42
  seq_length ...................................... 2048
  sgd_momentum .................................... 0.9
  short_seq_prob .................................. 0.1
  split ........................................... 949,50,1
  split_transformers .............................. False
  synchronize_each_layer .......................... False
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
  tensor_model_parallel_size ...................... 4
  tensorboard_dir ................................. /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/tr8-104B-logs/tensorboard
  tensorboard_log_interval ........................ 1
  tensorboard_queue_size .......................... 5
  tile_factor ..................................... 1
  titles_data_path ................................ None
  tokenizer_name_or_path .......................... None
  tokenizer_type .................................. GPT2BPETokenizer
/bin/sh: line 0: type: git: not found
  train_iters ..................................... None
  train_samples ................................... 300000000
  use_checkpoint_lr_scheduler ..................... False
  use_contiguous_buffers_in_ddp ................... False
  use_cpu_initialization .......................... None
  use_one_sent_docs ............................... False
  use_pin_memory .................................. False
  virtual_pipeline_model_parallel_size ............ None
  vocab_extra_ids ................................. 0
/bin/sh: line 0: type: git: not found
  vocab_file ...................................... /gpfswork/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/data/gpt2-vocab.json
  weight_decay .................................... 0.1
  world_size ...................................... 512
  zero_allgather_bucket_size ...................... 0.0
  zero_contigious_gradients ....................... False
  zero_reduce_bucket_size ......................... 0.0
  zero_reduce_scatter ............................. False
  zero_stage ...................................... 1
-------------------- end of arguments ---------------------
/bin/sh: line 0: type: git: not found
will use batch size rampup starting from global batch size 16 to global batch size 2048 with batch size increments 16 over 6000000 samples.
/bin/sh: line 0: type: git: not found
ninja .................. [92m[OKAY][0m
> building GPT2BPETokenizer tokenizer ...
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:torch install path ...............
 torch install path ...............['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 
torch version .................... 1.8.1['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch cuda versiontorch version  ...................................  11.11.8.1
/bin/sh: line 0: type: git: not found

nvcc version .....................torch cuda version  11.2...............
 deepspeed install path11.1 
...........nvcc version  .....................['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
11.2deepspeed info
 deepspeed install path...................  ...........0.4.2+bc17042, bc17042, big-science 
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed wheel compiled w. 
......deepspeed info  torch 1.8, cuda 11.1...................
 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
/bin/sh: line 0: type: git: not found
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
/bin/sh: line 0: type: git: not found
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
/bin/sh: line 0: type: git: not found
torch version .................... 1.8.1
/bin/sh: line 0: type: git: not found
torch cuda version ............... 11.1
/bin/sh: line 0: type: git: not found
nvcc version ..................... 11.2
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
/bin/sh: line 0: type: git: not found
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
DeepSpeed general environment info:
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
torch version .................... 1.8.1
torch cuda version ............... 11.1
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
--------------------------------------------------
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
torch version .................... 1.8.1
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
torch cuda version ............... 11.1
nvcc version ..................... 11.2
ninja .................. [92m[OKAY][0m
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
--------------------------------------------------
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
/bin/sh: line 0: type: git: not found
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
/bin/sh: line 0: type: git: not found
async_io ............... [93m[NO][0m ....... [93m[NO][0m
torch version .................... 1.8.1
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
ninjaninja  .................. ..................[92m[OKAY][0m
 [92m[OKAY][0m--------------------------------------------------

torch cuda version ............... 11.1
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------op name 
nvcc version ..................... 11.2
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
................ op nameinstalled  ..................  compatibleinstalled
 --------------------------------------------------..
 compatible
--------------------------------------------------
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
--------------------------------------------------
cpu_adam ............... [92m[YES][0mcpu_adam ......  ...............[92m[OKAY][0m 
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... fused_adam[92m[OKAY][0m
 ............. fused_lamb[93m[NO][0m  ....................  [93m[NO][0m [92m[OKAY][0m.......
 [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
sparse_attntransformer  ........................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

transformer stochastic_transformer............  [93m[NO][0m.  [93m[NO][0m.......  .......[92m[OKAY][0m [92m[OKAY][0m

stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
DeepSpeed general environment info:
async_io [93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`................ [93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m[NO][0m
 .......
 [93m[NO][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
async_iotransformer_inference  .................  async_io[93m[NO][0m[93m[NO][0m   .............................   [93m[NO][0m[92m[OKAY][0m[93m[NO][0m 

torch version .................... 1.8.1
torch cuda version ............... 11.1
....... [93m[NO][0m
/bin/sh: line 0: type: git: not found
nvcc version ..................... 11.2
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
transformer_inference transformer_inference..  ..[93m[NO][0mquantizer   .......[93m[NO][0m..............   [92m[OKAY][0m[93m[NO][0m.......
  .......[92m[OKAY][0m 
[92m[OKAY][0m
utils--------------------------------------------------utils 
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
 ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

/bin/sh: line 0: type: git: not found
----------------------------------------------------------------------------------------------------

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
DeepSpeed general environment info:
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
torch version .................... 1.8.1
torch cuda version ............... 11.1
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
nvcc version ..................... 11.2
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
/bin/sh: line 0: type: git: not found
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
ninja .................. [92m[OKAY][0m
DeepSpeed general environment info:
torch cuda version ............... 11.1
async_io ............... [93m[NO][0m ....... [93m[NO][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
nvcc version ..................... 11.2
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
op name ................ installed .. compatible
--------------------------------------------------
DeepSpeed general environment info:
torch version['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 
.................... 1.8.1torch version
 ....................torch cuda version  1.8.1...............
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
--------------------------------------------------
 11.1
torch cuda versionnvcc version  ....................................  11.111.2

deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
nvcc versiondeepspeed install path  ................................  11.2
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed install path
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.--------------------------------------------------

fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
 deepspeed info...........  ................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']0.4.2+bc17042, bc17042, big-science

async_io ............... [93m[NO][0m ....... [93m[NO][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
deepspeed wheel compiled w.deepspeed info  .........................  torch 1.8, cuda 11.10.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
torch version .................... 1.8.1
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
torch cuda version ............... 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
--------------------------------------------------
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
DeepSpeed general environment info:
torch cuda version ............... 11.1
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
nvcc version ..................... 11.2
torch version .................... 1.8.1
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
torch cuda version ............... 11.1
/bin/sh: line 0: type: git: not found
nvcc version ..................... 11.2
/bin/sh: line 0: type: git: not found
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
/bin/sh: line 0: type: git: not found
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
ninja .................. [92m[OKAY][0m
--------------------------------------------------
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
op name ................ installed .. compatible
--------------------------------------------------
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

DeepSpeed general environment info:
async_io ............... async_io[93m[NO][0m  ......................  [93m[NO][0m[93m[NO][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
 ....... [93m[NO][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
transformer_inference .. transformer_inference[93m[NO][0m  .........  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
torch version .................... 1.8.1
torch cuda version ............... 11.1
utils .................. utils[92m[YES][0m  ........................  [92m[YES][0m[92m[OKAY][0m 
torch cuda version ............... 11.1
nvcc version ..................... 11.2
...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m quantizer.......  ..............[92m[OKAY][0m 
nvcc version ..................... 11.2
/bin/sh: line 0: type: git: not found
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
[93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
transformer_inference .. [93m[NO][0m utils.......  ..................[92m[OKAY][0m 
DeepSpeed general environment info:
[92m[YES][0m ...... [92m[OKAY][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
utils ..................quantizer  [92m[YES][0m..............  ......[93m[NO][0m  [92m[OKAY][0m.......
DeepSpeed general environment info:torch cuda version 
............... 11.1
 [92m[OKAY][0m
quantizer .............. --------------------------------------------------[93m[NO][0m
nvcc versiontorch install path  ....................................  11.2
 ....... [92m[OKAY][0m
--------------------------------------------------
deepspeed install path ........... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed info torch version...................  ....................0.4.2+bc17042, bc17042, big-science 
1.8.1deepspeed wheel compiled w.
 ...... torch cuda versiontorch 1.8, cuda 11.1 
............... 11.1
nvcc version ..................... 11.2
/bin/sh: line 0: type: git: not found
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
/bin/sh: line 0: type: git: not found
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------ninja
 NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
..................-------------------------------------------------- 
[92m[OKAY][0mJIT compiled ops requires ninja

--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:torch install path ...............
 torch install path ...............['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 
torch version .................... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']1.8.1

torch cuda versiontorch version  ...................................  11.11.8.1

nvcc version .....................torch cuda version  11.2...............
 11.1deepspeed install path
 ...........nvcc version  ..................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']11.2

DeepSpeed general environment info:
torch install pathDeepSpeed general environment info: ............... 
deepspeed infodeepspeed install path  ..............................  0.4.2+bc17042, bc17042, big-science['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed wheel compiled w.deepspeed info  .........................  torch 1.8, cuda 11.1
0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
torch install path['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 
............... torch version .................... 1.8.1
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch cuda version ...............torch version  ....................11.1 
1.8.1nvcc version
 .....................torch cuda version  11.2...............
 deepspeed install path11.1 
...........nvcc version  .....................['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
11.2
deepspeed info deepspeed install path...................  ...........0.4.2+bc17042, bc17042, big-science 
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed wheel compiled w.
 ......deepspeed info  torch 1.8, cuda 11.1...................
 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
DeepSpeed general environment info:
torch version .................... 1.8.1
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
ninja .................. [92m[OKAY][0m
--------------------------------------------------
torch cuda version ............... 11.1
nvcc version ..................... 11.2
torch version .................... 1.8.1
op name ................ installed .. compatible
--------------------------------------------------
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
torch cuda version ............... 11.1
nvcc version ..................... 11.2
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
/bin/sh: line 0: type: git: not found
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
/bin/sh: line 0: type: git: not found
torch version .................... 1.8.1
torch cuda version ............... 11.1
/bin/sh: line 0: type: git: not found
nvcc version ..................... 11.2
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
/bin/sh: line 0: type: git: not found
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
/bin/sh: line 0: type: git: not found
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
nvcc version ..................... 11.2
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
async_io ............... [93m[NO][0m ....... [93m[NO][0m
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

JIT compiled ops requires ninja
async_io ...............async_io [93m[NO][0m  ......................  [93m[NO][0m[93m[NO][0m 
....... [93m[NO][0m
transformer_inference .. [93m[NO][0m transformer_inference.......  ..[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m
utils .................. utils[92m[YES][0m  ........................  [92m[YES][0m[92m[OKAY][0m 
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
...... [92m[OKAY][0m
quantizer .............. quantizer[93m[NO][0m  .....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
/bin/sh: line 0: type: git: not found
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
/bin/sh: line 0: type: git: not found
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io async_io...............  ...............[93m[NO][0m  [93m[NO][0m.......  .......[93m[NO][0m 
[93m[NO][0m
transformer_inference .. transformer_inference[93m[NO][0m  ....... [92m[OKAY][0m
.. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ......utils  [92m[OKAY][0m..................
 [92m[YES][0mquantizer  ....................  [92m[OKAY][0m[93m[NO][0m
DeepSpeed general environment info:
 ....... [92m[OKAY][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
quantizer .............. [93m[NO][0m-------------------------------------------------- 
....... [92m[OKAY][0m
torch version .................... 1.8.1
--------------------------------------------------
torch cuda version ............... 11.1
nvcc version ..................... 11.2
/bin/sh: line 0: type: git: not found
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
ninja .................. [92m[OKAY][0m
--------------------------------------------------
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
op name ................ installed .. compatible
--------------------------------------------------
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
/bin/sh: line 0: type: git: not found
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
torch version .................... 1.8.1
torch cuda version ............... 11.1
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
nvcc version ..................... 11.2
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
/bin/sh: line 0: type: git: not found
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
/bin/sh: line 0: type: git: not found
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed info deepspeed info...................  ...................0.4.2+bc17042, bc17042, big-science 
0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. deepspeed wheel compiled w.......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
ninja .................. [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

--------------------------------------------------
async_io async_io...............  ...............[93m[NO][0m  [93m[NO][0m.......  .......[93m[NO][0m 
[93m[NO][0m
op name ................ installed .. compatible
--------------------------------------------------
transformer_inference ..transformer_inference  [93m[NO][0m..  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
utils .................. utils[92m[YES][0m  ........................  [92m[YES][0m[92m[OKAY][0m 
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
...... [92m[OKAY][0m
DeepSpeed general environment info:
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0mquantizer  .....................  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
--------------------------------------------------
torch install pathDeepSpeed general environment info: ............... 
torch install path['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
--------------------------------------------------
............... torch version .................... 1.8.1
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch cuda version torch version...............  ....................11.1 
1.8.1nvcc version
/bin/sh: line 0: type: git: not found
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
 .....................torch cuda version  11.2...............
 deepspeed install path11.1 
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
...........nvcc version  .....................['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
11.2deepspeed info
 deepspeed install path...................  ...........0.4.2+bc17042, bc17042, big-science 
deepspeed wheel compiled w.['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
......deepspeed info  torch 1.8, cuda 11.1...................
 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
--------------------------------------------------
ninja .................. [92m[OKAY][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
--------------------------------------------------
torch version .................... 1.8.1
op name ................ installed .. compatible
--------------------------------------------------
torch cuda version ............... 11.1
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
nvcc version ..................... 11.2
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
torch version .................... 1.8.1
DeepSpeed general environment info:
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
torch cuda version ............... 11.1
DeepSpeed general environment info:
nvcc version ..................... 11.2
torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 
.................... 1.8.1torch version
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
 .................... torch cuda version1.8.1 
............... 11.1torch cuda version
 nvcc version...............  .....................11.1 
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
11.2nvcc version
 deepspeed install path.....................  ...........11.2 
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed install path
 ...........deepspeed info  ...................['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
0.4.2+bc17042, bc17042, big-sciencedeepspeed info
 ...................deepspeed wheel compiled w.  0.4.2+bc17042, bc17042, big-science......
 torch 1.8, cuda 11.1deepspeed wheel compiled w.
 ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:

torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path ...............  ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version torch version....................  ....................1.8.1 
1.8.1
torch cuda version torch cuda version...............  ...............11.1 
11.1nvcc version
 nvcc version.....................  .....................11.2 
11.2
deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
nvcc version ..................... 11.2
/bin/sh: line 0: type: git: not found
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
/bin/sh: line 0: type: git: not found
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
/bin/sh: line 0: type: git: not found
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0masync_io
 ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... transformer_inference[92m[OKAY][0m 
.. [93m[NO][0m ....... [92m[OKAY][0mutils
 .................. [92m[YES][0m ...... [92m[OKAY][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
utils .................. [92m[YES][0mquantizer  ....................  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m-------------------------------------------------- 
....... [92m[OKAY][0m
--------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m .......async_io  [93m[NO][0m...............
/bin/sh: line 0: type: git: not found
 [93m[NO][0m ....... [93m[NO][0m
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m utils.......  ..................[92m[OKAY][0m 
[92m[YES][0m ...... [92m[OKAY][0m
utils ..................quantizer  [92m[YES][0m..............  ......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
quantizer ..............-------------------------------------------------- 
[93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m [93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`........ [92m[OKAY][0m

**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer async_io..............  ...............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[93m[NO][0m
--------------------------------------------------
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
DeepSpeed general environment info:
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
--------------------------------------------------
torch version .................... 1.8.1
torch cuda version ............... 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
/bin/sh: line 0: type: git: not found
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
async_io ............... [93m[NO][0m ....... [93m[NO][0m
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
DeepSpeed general environment info:
async_io ............... [93m[NO][0m ....... [93m[NO][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
torch version .................... 1.8.1
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
torch cuda version ............... 11.1
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
nvcc version ..................... 11.2
--------------------------------------------------
--------------------------------------------------
JIT compiled ops requires ninja
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
async_io ............... [93m[NO][0m ....... [93m[NO][0m
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
DeepSpeed general environment info:
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninjaninja  ....................................  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

op nameop name  ................ ................installed  installed..  compatible..
 compatible--------------------------------------------------

--------------------------------------------------
cpu_adam ...............cpu_adam [92m[YES][0m  .....................  [92m[OKAY][0m[92m[YES][0m
 ...... [92m[OKAY][0m
fused_adam ............. fused_adam[93m[NO][0m  .................... [92m[OKAY][0m 
[93m[NO][0m ....... fused_lamb[92m[OKAY][0m .............
 [93m[NO][0m .......fused_lamb  [92m[OKAY][0m.............
 [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ sparse_attn[93m[NO][0m  ...................  [92m[OKAY][0m[93m[NO][0m
 ....... transformer[92m[OKAY][0m 
............ [93m[NO][0m ....... [92m[OKAY][0mtransformer
 ............ [93m[NO][0m stochastic_transformer.......  [92m[OKAY][0m.
 [93m[NO][0m ....... stochastic_transformer[92m[OKAY][0m 
. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
nvcc version ..................... 11.2
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------

deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninja
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
ninja .................. [92m[OKAY][0m
torch version .................... 1.8.1
--------------------------------------------------
torch cuda version ............... 11.1
op name ................ installed .. compatible
--------------------------------------------------
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
ninja .................. [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
op name ................ installed .. compatible
--------------------------------------------------
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
async_ioasync_io  .............................. [93m[NO][0m  [93m[NO][0m.......  .......[93m[NO][0m 
[93m[NO][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference transformer_inference..  ..[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m

stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****

**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2DeepSpeed general environment info:
deepspeed install path ...........
 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed infotorch install path ...................  ...............0.4.2+bc17042, bc17042, big-science
 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
ninjaninjaninja  ....................................   ..................[92m[OKAY][0m[92m[OKAY][0m 
[92m[OKAY][0m

utils .................. [92m[YES][0m ...... [92m[OKAY][0m
------------------------------------------------------------------------------------------------------------------------------------------------------


quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
op nameop nameop name   ................................................   installedinstalledinstalled   ......   compatiblecompatiblecompatible

/bin/sh: line 0: type: git: not found
--------------------------------------------------
/bin/sh: line 0: type: git: not found

------------------------------------------------------------------------------------------------------------------------------------------------------


/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
cpu_adamcpu_adam  ..............................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
cpu_adam fused_adamfused_adam  ..........................  [93m[NO][0m...............[93m[NO][0m   .......[92m[YES][0m.......   [92m[OKAY][0m......[92m[OKAY][0m
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
 
[92m[OKAY][0mfused_lamb
 fused_lamb.............  .............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0mfused_adam
 ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn sparse_attn............fused_lamb   .........................[93m[NO][0m   [93m[NO][0m[93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
 .......transformertransformer   [92m[OKAY][0m............
............  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

stochastic_transformerstochastic_transformer  ..  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version ....................['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
1.8.1
torch version torch cuda version....................  ...............1.8.1 
11.1
torch cuda versionnvcc version  ....................................  11.111.2

DeepSpeed general environment info:
nvcc versiondeepspeed install path  ................................  11.2
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed install path
 deepspeed info...........  ................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']0.4.2+bc17042, bc17042, big-science

deepspeed infodeepspeed wheel compiled w.  .........................  0.4.2+bc17042, bc17042, big-sciencetorch 1.8, cuda 11.1

deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
DeepSpeed general environment info:
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

DeepSpeed general environment info:DeepSpeed general environment info:

torch install path torch install path...............  ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****

**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

DeepSpeed general environment info:
transformer_inference transformer_inference..  ..[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

torch version .................... 1.8.1
quantizer .............. quantizer[93m[NO][0m  .....................  [93m[NO][0m[92m[OKAY][0m 
torch cuda version ............... 11.1
....... [92m[OKAY][0m
--------------------------------------------------
nvcc version ..................... 11.2
--------------------------------------------------
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
DeepSpeed general environment info:torch install path 
............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 
.................... torch version1.8.1 
.................... torch cuda version1.8.1 
............... torch cuda version11.1 
...............nvcc version  11.1.....................
 nvcc version11.2 
.....................deepspeed install path  11.2...........
 deepspeed install path ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']...........
 deepspeed info ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']...................
 deepspeed info0.4.2+bc17042, bc17042, big-science 
...................deepspeed wheel compiled w.  0.4.2+bc17042, bc17042, big-science......
 deepspeed wheel compiled w.torch 1.8, cuda 11.1 
...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io async_io...............  ...............[93m[NO][0m  [93m[NO][0m.......  [93m[NO][0m.......
 [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0mtransformer_inference
 .. [93m[NO][0m ....... [92m[OKAY][0mutils
 .................. [92m[YES][0m ...... utils[92m[OKAY][0m 
.................. [92m[YES][0m ...... quantizer[92m[OKAY][0m 
.............. [93m[NO][0m quantizer.......  ..............[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
DeepSpeed general environment info:
torch cuda version ............... 11.1
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
nvcc version ..................... 11.2
torch version .................... 1.8.1
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
torch cuda version ............... 11.1
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
DeepSpeed general environment info:
torch cuda version ............... 11.1
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
nvcc version ..................... 11.2
torch version .................... 1.8.1
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
torch cuda version ............... 11.1
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
 > padded vocab (size: 50257) with 431 dummy tokens (new size: 50688)
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
DeepSpeed general environment info:
torch version .................... 1.8.1
torch cuda version ............... 11.1
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
nvcc version ..................... 11.2
torch version .................... 1.8.1
torch cuda version ............... 11.1
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
nvcc version ..................... 11.2
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1DeepSpeed general environment info:

torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0mtransformer_inference  .........  [93m[NO][0m[93m[NO][0m
 ....... [92m[OKAY][0m
utils .................. [92m[YES][0mtransformer_inference  ........  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0mquantizer
 .............. [93m[NO][0m ....... [92m[OKAY][0mutils
 .................. [92m[YES][0m --------------------------------------------------......
 [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
DeepSpeed general environment info:torch install path ...............
 torch install path ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']...............
 torch version .................... 1.8.1
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch cuda version ...............torch version  11.1....................
 nvcc version1.8.1 
..................... 11.2torch cuda version
 deepspeed install path...............  ...........11.1 
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']nvcc version
 deepspeed info.....................  ...................11.2 
0.4.2+bc17042, bc17042, big-sciencedeepspeed install path
 deepspeed wheel compiled w............  ...... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']torch 1.8, cuda 11.1

deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
DeepSpeed general environment info:
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
--------------------------------------------------
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed infoDeepSpeed general environment info: ................... 
0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1torch install path
 ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ................... ................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
> setting codecarbon ...
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninjaninja  ....................................  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

op nameop name  ................................  installedinstalled  ....  compatiblecompatible

----------------------------------------------------------------------------------------------------

cpu_adam ...............cpu_adam  [92m[YES][0m .....................  [92m[YES][0m[92m[OKAY][0m 
...... [92m[OKAY][0m
fused_adam ............. fused_adam[93m[NO][0m  ....................  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
fused_lamb ............. fused_lamb[93m[NO][0m  ....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
sparse_attn ............ sparse_attn[93m[NO][0m  ...................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
transformer transformer............  ............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
stochastic_transformerstochastic_transformer  ..  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io ...............  ...............[93m[NO][0m  [93m[NO][0m.......  .......[93m[NO][0m 
[93m[NO][0m
transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

quantizer quantizer..............  ..............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
----------------------------------------------------------------------------------------------------

> initializing torch distributed ...
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
DeepSpeed general environment info:torch install path ...............
 torch install path ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']...............
 torch version .................... 1.8.1['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch cuda versiontorch version  ...................................  11.11.8.1

nvcc version torch cuda version.....................  ...............11.2 
11.1deepspeed install path
 nvcc version...........  ..................... 11.2['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed install pathdeepspeed info  ..............................  0.4.2+bc17042, bc17042, big-science
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed wheel compiled w.
 deepspeed info......  ...................torch 1.8, cuda 11.1 
0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
------------------------------------------------------------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.----------------------------------------------------------------------------------------------------


----------------------------------------------------------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja


JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report

--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja    ...................................................... ..................   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


----------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------
op name

 op nameop nameop name................    ................................installed ................  installed installed ..installed ....   compatible compatible..

compatible ----------------------------------------------------------------------------------------------------
compatible


----------------------------------------------------------------------------------------------------

cpu_adamcpu_adam  cpu_adam...............cpu_adam  .............................. [92m[YES][0m  ............... [92m[YES][0m[92m[YES][0m ......   [92m[YES][0m......[92m[OKAY][0m...... 
 [92m[OKAY][0m ......
[92m[OKAY][0m 
[92m[OKAY][0m
fused_adam ............. [93m[NO][0m .......fused_adamfused_adam fused_adam [92m[OKAY][0m  ..........................
 ............. [93m[NO][0m fused_lamb [93m[NO][0m[93m[NO][0m .......  .................... [92m[OKAY][0m.......   [93m[NO][0m[92m[OKAY][0m
[92m[OKAY][0m
 
fused_lamb.......fused_lamb [92m[OKAY][0m fused_lamb
 ............. ............. ............. [93m[NO][0m [93m[NO][0m [93m[NO][0m.......   ..............[92m[OKAY][0m 
 [92m[OKAY][0m[92m[OKAY][0m
sparse_attn
 ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ sparse_attn[93m[NO][0msparse_attnsparse_attn    ...........................................  [92m[OKAY][0m 
[93m[NO][0m .......[93m[NO][0m stochastic_transformer[93m[NO][0m    [92m[OKAY][0m..............
 . [92m[OKAY][0m [92m[OKAY][0m
transformer[93m[NO][0m
  transformer............transformer.......    ............[92m[OKAY][0m............[93m[NO][0m 
  [93m[NO][0m.......[93m[NO][0m   .......[92m[OKAY][0m....... 
 [92m[OKAY][0m[92m[OKAY][0m
stochastic_transformer
 stochastic_transformer.  stochastic_transformer[93m[NO][0m . . ....... [93m[NO][0m [93m[NO][0m  [92m[OKAY][0m.............. 
 [92m[OKAY][0m[92m[OKAY][0m

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
DeepSpeed general environment info:
torch install path ............... torch install path DeepSpeed general environment info:...............['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 

torch version ....................['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch install path 
 1.8.1...............
torch version  torch cuda version....................  ...............1.8.1 
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']11.1
torch cuda version
 nvcc versiontorch version ............... .........................................   1.8.111.111.2


nvcc versiondeepspeed install pathtorch cuda version   ...............................................   11.2
11.1['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed install pathnvcc version deepspeed info...........   ........................................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']11.20.4.2+bc17042, bc17042, big-science


deepspeed infodeepspeed install path deepspeed wheel compiled w. ................... ........... ...... 0.4.2+bc17042, bc17042, big-science 
torch 1.8, cuda 11.1['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed wheel compiled w.
 deepspeed info .........................  torch 1.8, cuda 11.10.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report


------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
--------------------------------------------------
----------------------------------------------------------------------------------------------------
JIT compiled ops requires ninja

JIT compiled ops requires ninja
JIT compiled ops requires ninja
JIT compiled ops requires ninja

ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


----------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------
op name

 op nameop nameop name  ................ ................................ ................  installed installedinstalledinstalled   .... .. ..   compatiblecompatiblecompatiblecompatible


------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------


cpu_adam cpu_adamcpu_adam...............   cpu_adam...............[92m[YES][0m...............   ............... [92m[YES][0m[92m[YES][0m  ...... [92m[YES][0m...... ......  [92m[OKAY][0m ......[92m[OKAY][0m 
[92m[OKAY][0m
[92m[OKAY][0m

fused_adamfused_adam  .............fused_adam............. fused_adam   [93m[NO][0m[93m[NO][0m............. ............. .......   .......[93m[NO][0m[93m[NO][0m[92m[OKAY][0m   
.......[92m[OKAY][0m....... 
 fused_lamb[92m[OKAY][0m[92m[OKAY][0m 
fused_lamb
.............  .............[93m[NO][0m fused_lamb fused_lamb[93m[NO][0m .......  ............. .................... [92m[OKAY][0m [93m[NO][0m 
[93m[NO][0m [92m[OKAY][0m .......
.......  [92m[OKAY][0m[92m[OKAY][0m

sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0msparse_attn
 ............ transformer[93m[NO][0msparse_attn   sparse_attn...............................    [92m[OKAY][0m............[93m[NO][0m 
[93m[NO][0m [93m[NO][0m ....... transformer....... .......  [92m[OKAY][0m ............[92m[OKAY][0m[92m[OKAY][0m

 
[93m[NO][0mtransformer  .......stochastic_transformer............  transformer[92m[OKAY][0m  
.[93m[NO][0m............   [93m[NO][0m.......[93m[NO][0mstochastic_transformer    [92m[OKAY][0m..............
 .[92m[OKAY][0m  
[92m[OKAY][0m[93m[NO][0mstochastic_transformer
  ....... .[92m[OKAY][0mstochastic_transformer 
 [93m[NO][0m ........  [93m[NO][0m[92m[OKAY][0m .......
 [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ......[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [92m[OKAY][0m

quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report--------------------------------------------------
--------------------------------------------------
--------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

------------------------------------------------------------------------------------------------------------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja

------------------------------------------------------------------------------------------------------------------------------------------------------


JIT compiled ops requires ninjaJIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report


--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninja   ......................................................ninja    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m..................


 [92m[OKAY][0m------------------------------------------------------------------------------------------------------------------------------------------------------


op name--------------------------------------------------op nameop name 
  ................................op name................    installed................installedinstalled    ..installed....    compatible..compatiblecompatible
 

compatible--------------------------------------------------
----------------------------------------------------------------------------------------------------


--------------------------------------------------
cpu_adam cpu_adam...............cpu_adamcpu_adam    ...............[92m[YES][0m..............................    [92m[YES][0m......[92m[YES][0m[92m[YES][0m    [92m[OKAY][0m..................
   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


fused_adam ............. fused_adamfused_adam[93m[NO][0m  fused_adam............. .............  ....... ............. [93m[NO][0m[93m[NO][0m [92m[OKAY][0m  
.......[93m[NO][0m.......   [92m[OKAY][0m.......[92m[OKAY][0mfused_lamb
  
[92m[OKAY][0m.............
 fused_lamb[93m[NO][0mfused_lamb   .............fused_lamb....................    [93m[NO][0m.............[92m[OKAY][0m[93m[NO][0m  ....... 
....... [93m[NO][0m [92m[OKAY][0m [92m[OKAY][0m
.......
 [92m[OKAY][0m
sparse_attn ............ [93m[NO][0msparse_attn  .......sparse_attn............   [92m[OKAY][0m............sparse_attn[93m[NO][0m
   [93m[NO][0m...................   .......transformer[93m[NO][0m[92m[OKAY][0m   [92m[OKAY][0m
...................
  [93m[NO][0m[92m[OKAY][0m transformer.......transformer
   ............[92m[OKAY][0m............ 
 [93m[NO][0m[93m[NO][0mtransformer   .......................... stochastic_transformer  [92m[OKAY][0m [92m[OKAY][0m[93m[NO][0m

 ........  [93m[NO][0m[92m[OKAY][0m stochastic_transformer
stochastic_transformer.......   [92m[OKAY][0m
..stochastic_transformer   [93m[NO][0m[93m[NO][0m  ...............   [92m[OKAY][0m[93m[NO][0m
 [92m[OKAY][0m.......
 [92m[OKAY][0m
------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------
DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

DeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report--------------------------------------------------
--------------------------------------------------


----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------
--------------------------------------------------


----------------------------------------------------------------------------------------------------JIT compiled ops requires ninja
JIT compiled ops requires ninja
JIT compiled ops requires ninja


JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninjaninjaninjaninja    ....................................  ....................................[92m[OKAY][0m[92m[OKAY][0m  

[92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

----------------------------------------------------------------------------------------------------

op nameop name  op nameop name................................    ................installedinstalled................    ..installed.. installed  compatiblecompatible ..

.. -------------------------------------------------- --------------------------------------------------compatible
compatible


----------------------------------------------------------------------------------------------------

cpu_adamcpu_adam  ...............cpu_adam............... cpu_adam  [92m[YES][0m ............... [92m[YES][0m.....................    ......[92m[YES][0m[92m[OKAY][0m[92m[YES][0m 
  [92m[OKAY][0m............
  [92m[OKAY][0m[92m[OKAY][0m

fused_adam .............fused_adam  [93m[NO][0m.............  fused_adam.......[93m[NO][0mfused_adam    [92m[OKAY][0m.................................
   [92m[OKAY][0m[93m[NO][0mfused_lamb [93m[NO][0m
 ....... ............. fused_lamb .......[92m[OKAY][0m[93m[NO][0m   
.............[92m[OKAY][0m.......  
[93m[NO][0m[92m[OKAY][0mfused_lamb 
 .......fused_lamb.............   [92m[OKAY][0m.............[93m[NO][0m
  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
sparse_attn ............ [93m[NO][0m .......sparse_attn  [92m[OKAY][0m............
 [93m[NO][0m .......transformersparse_attn  sparse_attn ............[92m[OKAY][0m 
 ........................[93m[NO][0mtransformer    ............[93m[NO][0m.......[93m[NO][0m   [93m[NO][0m [92m[OKAY][0m .....................
   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


stochastic_transformer stochastic_transformer.transformertransformer    [93m[NO][0m........................  . ....... [93m[NO][0m[93m[NO][0m[93m[NO][0m    [92m[OKAY][0m.....................
   [92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m

stochastic_transformer stochastic_transformer . .[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

DeepSpeed C++/CUDA extension op report
JIT compiled ops requires ninja--------------------------------------------------
DeepSpeed C++/CUDA extension op report


--------------------------------------------------JIT compiled ops requires ninja
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

--------------------------------------------------JIT compiled ops requires ninja

JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninjaninjaninjaninja   .................................... ..................  .................. [92m[OKAY][0m[92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m

--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
op name
op name op name op name................   ................................installed................    installedinstalled..  installed .... compatible  ..
 compatiblecompatible--------------------------------------------------
compatible


----------------------------------------------------------------------------------------------------
--------------------------------------------------

cpu_adam ............... cpu_adamcpu_adam[92m[YES][0mcpu_adam  ...............  ....................................    [92m[YES][0m[92m[OKAY][0m[92m[YES][0m[92m[YES][0m 
 ...... ...... ...... [92m[OKAY][0m[92m[OKAY][0m 

[92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0mfused_adam
fused_adamfused_adam   fused_lamb.......................................    .............[93m[NO][0m[93m[NO][0m [93m[NO][0m   [93m[NO][0m.....................    [92m[OKAY][0m.......[92m[OKAY][0m
[92m[OKAY][0m 

[92m[OKAY][0mfused_lamb
fused_lamb fused_lamb ............. ............. ............. [93m[NO][0m [93m[NO][0m [93m[NO][0m ....... ....... ....... [92m[OKAY][0msparse_attn[92m[OKAY][0m 
 
[92m[OKAY][0m............
 [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0msparse_attn sparse_attn sparse_attn....... ............  ............  ............[92m[OKAY][0m[93m[NO][0m[93m[NO][0m 
  [93m[NO][0m..............   stochastic_transformer.......[92m[OKAY][0m[92m[OKAY][0m
  
[92m[OKAY][0mtransformer.
transformer   ............[93m[NO][0m............transformer   [93m[NO][0m .......[93m[NO][0m  ............ [92m[OKAY][0m..............   
[93m[NO][0m[92m[OKAY][0m[92m[OKAY][0m 

....... [92m[OKAY][0m
stochastic_transformerstochastic_transformer  stochastic_transformer. .  [93m[NO][0m[93m[NO][0m.   .............. [92m[OKAY][0m[93m[NO][0m
  [92m[OKAY][0m.......
 [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

------------------------------------------------------------------------------------------------------------------------------------------------------


DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
JIT compiled ops requires ninjaJIT compiled ops requires ninja
DeepSpeed C++/CUDA extension op report


--------------------------------------------------
--------------------------------------------------JIT compiled ops requires ninja

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja   .................. ....................................  .................. [92m[OKAY][0m[92m[OKAY][0m [92m[OKAY][0m

[92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
op name
--------------------------------------------------op name op name
................  op name ................................installed    installed..................installed    ..compatible ..compatibleinstalled
 
 compatible----------------------------------------------------------------------------------------------------..

 
compatible--------------------------------------------------

--------------------------------------------------
cpu_adamcpu_adam  .............................. cpu_adam [92m[YES][0mcpu_adam[92m[YES][0m   ...... ............... .....................   [92m[OKAY][0m[92m[OKAY][0m[92m[YES][0m[92m[YES][0m

  ............  [92m[OKAY][0m[92m[OKAY][0m

fused_adam fused_adam.............  [93m[NO][0m.............  .......[93m[NO][0m fused_adam [92m[OKAY][0mfused_adam .......
.............   fused_lamb[92m[OKAY][0m.............
[93m[NO][0m   .............[93m[NO][0m....... fused_lamb  .......[93m[NO][0m [92m[OKAY][0m  ....................
[92m[OKAY][0m  
[92m[OKAY][0m[93m[NO][0mfused_lamb
  fused_lamb....................   [92m[OKAY][0m.............[93m[NO][0m
  [93m[NO][0m.......  sparse_attn.......[92m[OKAY][0m 
 ............[92m[OKAY][0m [93m[NO][0m
 sparse_attn.......  ............[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0mtransformer
sparse_attn ............ sparse_attn ............ transformer[93m[NO][0m  [93m[NO][0m............ ............ .......  [93m[NO][0m .......[92m[OKAY][0m  
[93m[NO][0m....... [92m[OKAY][0m .......
stochastic_transformer [92m[OKAY][0m [92m[OKAY][0m
transformer
.  ............[93m[NO][0mtransformer   stochastic_transformer.......[93m[NO][0m............   [92m[OKAY][0m ........
[93m[NO][0m  [93m[NO][0m[92m[OKAY][0m  
..............  [92m[OKAY][0m[92m[OKAY][0m
stochastic_transformer
 . [93m[NO][0m stochastic_transformer.......  [92m[OKAY][0m.
 [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op report
--------------------------------------------------
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

DeepSpeed C++/CUDA extension op report
----------------------------------------------------------------------------------------------------
--------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
JIT compiled ops requires ninja

JIT compiled ops requires ninja
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------


----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report


JIT compiled ops requires ninjaJIT compiled ops requires ninja--------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
--------------------------------------------------JIT compiled ops requires ninja

DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op report

--------------------------------------------------DeepSpeed C++/CUDA extension op report

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
------------------------------------------------------------------------------------------------------------------------------------------------------


JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report--------------------------------------------------


----------------------------------------------------------------------------------------------------


DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja


----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja  ..................  .................................... ..................  [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m


--------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------

op name
op name op nameop name................    installed................................................   installed..  installedinstalled..    compatiblecompatible
....
--------------------------------------------------  --------------------------------------------------
compatible
compatible

----------------------------------------------------------------------------------------------------

cpu_adamcpu_adam  ...............cpu_adam............... cpu_adam [92m[YES][0m [92m[YES][0m  ............... ..................... ......   [92m[YES][0m[92m[OKAY][0m[92m[OKAY][0m[92m[YES][0m  
......
......  [92m[OKAY][0m[92m[OKAY][0m

fused_adam fused_adam.............  .............[93m[NO][0mfused_adam  fused_adam [93m[NO][0m .................... .............  ....... [92m[OKAY][0m[93m[NO][0m [93m[NO][0m 
 .......[92m[OKAY][0m.......fused_lamb  
 .............[92m[OKAY][0m[92m[OKAY][0m 

[93m[NO][0mfused_lamb  .................... fused_lamb [92m[OKAY][0mfused_lamb
 [93m[NO][0m ............. ............. ....... [93m[NO][0m [93m[NO][0m [92m[OKAY][0m .......
.......  [92m[OKAY][0m[92m[OKAY][0m
sparse_attn
 ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ sparse_attn[93m[NO][0msparse_attn  sparse_attn ............ ...............................    [93m[NO][0m[93m[NO][0m[93m[NO][0m[92m[OKAY][0m  .......  ....... 
.......[92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m

stochastic_transformer transformertransformertransformer .  ............ ............[93m[NO][0m............    .......[93m[NO][0m[93m[NO][0m[93m[NO][0m    [92m[OKAY][0m.....................
   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


stochastic_transformerstochastic_transformer stochastic_transformer .  .[93m[NO][0m . [93m[NO][0m .......  [93m[NO][0m.......[92m[OKAY][0m 
 .......[92m[OKAY][0m 
[92m[OKAY][0m
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... async_io[93m[NO][0m
 ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m .......transformer_inference  [92m[OKAY][0m..
 [93m[NO][0m .......utils  [92m[OKAY][0m..................
 [92m[YES][0m ...... [92m[OKAY][0m
utils quantizer..................  ..............[92m[YES][0m  [93m[NO][0m......  .......[92m[OKAY][0m 
[92m[OKAY][0m
quantizer-------------------------------------------------- 
.............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninjaninjaninja  ..................ninja .................. ..................  [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m..................


 ----------------------------------------------------------------------------------------------------[92m[OKAY][0m
--------------------------------------------------


--------------------------------------------------op nameop name
op name  op name................................   ................ installed ................ ..installed installedinstalled    ..compatible.. .. 
compatible compatible--------------------------------------------------

compatible

----------------------------------------------------------------------------------------------------

--------------------------------------------------
cpu_adamcpu_adam  cpu_adam...............cpu_adam...............    [92m[YES][0m..............................[92m[YES][0m    [92m[YES][0m......[92m[YES][0m......  [92m[OKAY][0m  ......
......[92m[OKAY][0m  [92m[OKAY][0m
[92m[OKAY][0m

fused_adam ............. [93m[NO][0m fused_adam.......fused_adam  fused_adam .............[92m[OKAY][0m.............   .............[93m[NO][0m 
[93m[NO][0m [93m[NO][0m fused_lamb ....... .............. .............   [92m[OKAY][0m[93m[NO][0m[92m[OKAY][0m [92m[OKAY][0m

.......
 [92m[OKAY][0mfused_lambfused_lamb
fused_lamb   .......................................   [93m[NO][0m[93m[NO][0m[93m[NO][0m   ..............sparse_attn  ....... [92m[OKAY][0m[92m[OKAY][0m............ 

[92m[OKAY][0m [93m[NO][0m
ninjaninjaninja ninja   ...................................................... ..................  [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m
 ....... [92m[OKAY][0m


----------------------------------------------------------------------------------------------------

----------------------------------------------------------------------------------------------------

transformer sparse_attn............sparse_attn sparse_attn  ............  [93m[NO][0m........................ [93m[NO][0m.......   [93m[NO][0m  .......[93m[NO][0m.......[92m[OKAY][0m   
op nameop name  op name................op name................    ................installed................installed    installed..installed..    ..compatiblecompatible..

[92m[OKAY][0m
.......[92m[OKAY][0mstochastic_transformer 
[92m[OKAY][0m transformer
 --------------------------------------------------compatible 
--------------------------------------------------compatible


--------------------------------------------------
--------------------------------------------------
 .transformer............  [93m[NO][0mtransformer............   ............ ....... [93m[NO][0m[93m[NO][0m[93m[NO][0m    .......[92m[OKAY][0m ..............
cpu_adam ...............cpu_adam  [92m[YES][0m...............  cpu_adam......cpu_adam  [92m[YES][0m ............... ...............[92m[OKAY][0m ......  
[92m[YES][0m[92m[OKAY][0m[92m[YES][0m 
[92m[OKAY][0m 
 [92m[OKAY][0m[92m[OKAY][0m

 ............  [92m[OKAY][0m[92m[OKAY][0m

stochastic_transformer stochastic_transformerstochastic_transformer.   .[93m[NO][0m.   [93m[NO][0m.......[93m[NO][0m   ..............[92m[OKAY][0m  
[92m[OKAY][0m[92m[OKAY][0m

fused_adam ............. fused_adam[93m[NO][0m  ....................  [93m[NO][0mfused_adam[92m[OKAY][0m 
 fused_adam....................  fused_lamb[92m[OKAY][0m 
............. [93m[NO][0m .............fused_lamb [93m[NO][0m  ....... [93m[NO][0m....................    .......[93m[NO][0m[92m[OKAY][0m [92m[OKAY][0m [92m[OKAY][0m

.......
 [92m[OKAY][0mfused_lambfused_lamb
  ..........................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

sparse_attn ............sparse_attn  [93m[NO][0m............  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
transformersparse_attn ............transformersparse_attn    ............[93m[NO][0m............   [93m[NO][0m...................[93m[NO][0m    [92m[OKAY][0m.......[93m[NO][0m
 ....... ....... [92m[OKAY][0m [92m[OKAY][0m
stochastic_transformer[92m[OKAY][0m
 
transformerstochastic_transformer . ............transformer  .  [93m[NO][0m[93m[NO][0m............[93m[NO][0m    ..............[93m[NO][0m ....... [92m[OKAY][0m  [92m[OKAY][0m
[92m[OKAY][0m.......

 stochastic_transformer[92m[OKAY][0m 
. [93m[NO][0mstochastic_transformer  ....... [92m[OKAY][0m.
 [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
------------------------------------------------------------------------------------------------------------------------------------------------------


DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja


----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


--------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op report
JIT compiled ops requires ninja
JIT compiled ops requires ninja
--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninjaninjaninjaninja   .................. .................................... ..................  [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m

[92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------op name

 op nameop name................  op name................................    installedinstalled................installed   ..  installed....compatible   
compatible..--------------------------------------------------compatible
 

--------------------------------------------------compatible--------------------------------------------------


--------------------------------------------------
cpu_adam ............... [92m[YES][0mcpu_adam cpu_adam......cpu_adam   ...............[92m[OKAY][0m ............... 
 [92m[YES][0m...............[92m[YES][0m   ......[92m[YES][0m......   [92m[OKAY][0m[92m[OKAY][0m......

fused_adam  [92m[OKAY][0m.............
 [93m[NO][0m ....... [92m[OKAY][0m
fused_adamfused_adamfused_lamb   .............fused_adam............. .............  [93m[NO][0m[93m[NO][0m  ............. [93m[NO][0m..............    [92m[OKAY][0m[93m[NO][0m.......[92m[OKAY][0m
  
.......[92m[OKAY][0m 
[92m[OKAY][0mfused_lamb
 .............fused_lamb  fused_lamb[93m[NO][0m.............   sparse_attn[93m[NO][0m....................   [92m[OKAY][0m ...................
 [93m[NO][0m  [93m[NO][0m.......[92m[OKAY][0m  
.......[92m[OKAY][0m 
[92m[OKAY][0m
transformersparse_attn  ........................  [93m[NO][0m[93m[NO][0m sparse_attn ....... ....... ............ [92m[OKAY][0msparse_attn 
[92m[OKAY][0m [93m[NO][0m
............ stochastic_transformer  .......[93m[NO][0mtransformer.    ...................[92m[OKAY][0m[93m[NO][0m   [92m[OKAY][0m
[93m[NO][0m
 .............. transformer[92m[OKAY][0m  [92m[OKAY][0mtransformer
............
  ............[93m[NO][0m  [93m[NO][0m.......  stochastic_transformer.......[92m[OKAY][0m  
[92m[OKAY][0m.
 [93m[NO][0mstochastic_transformer  .......stochastic_transformer  .[92m[OKAY][0m 
.[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ...............async_io  [93m[NO][0m...............  .......[93m[NO][0m  [93m[NO][0m.......
 [93m[NO][0m
transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
----------------------------------------------------------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
DeepSpeed C++/CUDA extension op report----------------------------------------------------------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report


--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja
--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja    ........................................................................  [92m[OKAY][0m  [92m[OKAY][0m
[92m[OKAY][0m[92m[OKAY][0m


----------------------------------------------------------------------------------------------------

----------------------------------------------------------------------------------------------------

op nameop nameop nameop name    ................................................................    installedinstalledinstalledinstalled    ........   compatible compatiblecompatible
compatible


----------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------


cpu_adam cpu_adam...............cpu_adamcpu_adam    ..............................[92m[YES][0m...............    [92m[YES][0m......[92m[YES][0m[92m[YES][0m    ......[92m[OKAY][0m............ 
  [92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m

fused_adam .............fused_adamfused_adam   [93m[NO][0m.............fused_adam.............   ....... [93m[NO][0m............. [93m[NO][0m  [92m[OKAY][0m .......[93m[NO][0m
.......   [92m[OKAY][0m.......[92m[OKAY][0m
 
fused_lamb[92m[OKAY][0m 
.............fused_lamb fused_lamb [93m[NO][0m  .......................... fused_lamb....... [93m[NO][0m  [93m[NO][0m .............[92m[OKAY][0m....... 
  [93m[NO][0m.......[92m[OKAY][0m  
.......[92m[OKAY][0m 
[92m[OKAY][0m
sparse_attn ............ [93m[NO][0msparse_attn sparse_attn ....... ............ ............sparse_attn[92m[OKAY][0m   
[93m[NO][0m[93m[NO][0m............   ..............transformer[93m[NO][0m   [92m[OKAY][0m [92m[OKAY][0m............
.......
  [93m[NO][0m[92m[OKAY][0m transformer
....... transformer ............ [92m[OKAY][0mtransformer ............
 [93m[NO][0m............   .......[93m[NO][0m[93m[NO][0m   [92m[OKAY][0m..............stochastic_transformer
   [92m[OKAY][0m[92m[OKAY][0m

.stochastic_transformer [93m[NO][0m  stochastic_transformer....... .  stochastic_transformer[92m[OKAY][0m.[93m[NO][0m 
  [93m[NO][0m........   .......[92m[OKAY][0m[93m[NO][0m 
 [92m[OKAY][0m.......
 [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report--------------------------------------------------
--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report

--------------------------------------------------
--------------------------------------------------JIT compiled ops requires ninja

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utils ..................utils  [92m[YES][0m..................  ......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
quantizer .............. quantizer[93m[NO][0m  .....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
ninjaninjaninjaninja   .................. .................................... ..................[92m[OKAY][0m  
 [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m
--------------------------------------------------


----------------------------------------------------------------------------------------------------op name
-------------------------------------------------- 
................
op nameop name  op name ................installed ................   ..................installedinstalled    compatible..installed..
   --------------------------------------------------compatible..

compatible --------------------------------------------------
compatible

--------------------------------------------------
--------------------------------------------------
cpu_adam ...............cpu_adam  [92m[YES][0m...............cpu_adam  cpu_adam ......[92m[YES][0m ...............  ...............[92m[OKAY][0m ......
 [92m[YES][0m [92m[YES][0m[92m[OKAY][0m 
 ............  [92m[OKAY][0m[92m[OKAY][0m

fused_adam ............. [93m[NO][0m fused_adam.......  .............[92m[OKAY][0m 
fused_adam[93m[NO][0mfused_adam   fused_lamb.................................   [92m[OKAY][0m 
.............[93m[NO][0m[93m[NO][0m  [93m[NO][0m fused_lamb.......  ....... ....................[92m[OKAY][0m   
[93m[NO][0m[92m[OKAY][0m[92m[OKAY][0m 

fused_lamb.......  [92m[OKAY][0mfused_lamb.............
  .............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
sparse_attn ............ [93m[NO][0m sparse_attn.......  ............[92m[OKAY][0m 
[93m[NO][0m .......transformer sparse_attn[92m[OKAY][0m sparse_attn ............
 ............ transformer............ [93m[NO][0m  [93m[NO][0m ............[93m[NO][0m .......   .......[93m[NO][0m[92m[OKAY][0m....... 
  .......[92m[OKAY][0m[92m[OKAY][0m 

stochastic_transformer[92m[OKAY][0m 
transformer. transformerstochastic_transformer............   [93m[NO][0m ............ [93m[NO][0m........   [93m[NO][0m [93m[NO][0m....... [92m[OKAY][0m  
..............[92m[OKAY][0m  
[92m[OKAY][0m[92m[OKAY][0m

stochastic_transformer stochastic_transformer.  [93m[NO][0m ........  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m utils.......  [92m[OKAY][0m
.................. [92m[YES][0m ......utils  [92m[OKAY][0m
.................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer ..............quantizer  [93m[NO][0m.............. [93m[NO][0m .......  .......[92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
------------------------------------------------------------------------------------------------------------------------------------------------------


--------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report


----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report

--------------------------------------------------
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------


--------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja--------------------------------------------------


JIT compiled ops requires ninja

JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science
0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w.
 deepspeed wheel compiled w.......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
ninjaninjaninjaninja   .................. .................. ..................[92m[OKAY][0m..................   [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
[92m[OKAY][0m[92m[OKAY][0m
--------------------------------------------------


----------------------------------------------------------------------------------------------------op name

--------------------------------------------------
-------------------------------------------------- op name
................op name  op name installed................  ................installed ................   installed....  installed  ..compatiblecompatible.. 

 compatible--------------------------------------------------compatible--------------------------------------------------


----------------------------------------------------------------------------------------------------

cpu_adamcpu_adam  ...............cpu_adamcpu_adam...............    [92m[YES][0m..............................[92m[YES][0m   [92m[YES][0m...... ......[92m[YES][0m  ......  [92m[OKAY][0m ......[92m[OKAY][0m
[92m[OKAY][0m 

[92m[OKAY][0m
fused_adam .............fused_adam fused_adamfused_adam  [93m[NO][0m .......................................   ....... [93m[NO][0m[93m[NO][0m [93m[NO][0m[92m[OKAY][0m   
.............. .......[92m[OKAY][0m fused_lamb  
[92m[OKAY][0m[92m[OKAY][0m.............

 [93m[NO][0mfused_lamb  .......fused_lambfused_lamb.............    [93m[NO][0m[92m[OKAY][0m .............
....................   [93m[NO][0m[93m[NO][0m[92m[OKAY][0m  
..............  [92m[OKAY][0m[92m[OKAY][0m
sparse_attn
 ............ [93m[NO][0m ....... [92m[OKAY][0m
sparse_attntransformer  ........................ sparse_attn [93m[NO][0msparse_attn [93m[NO][0m  ....... ...............................    [92m[OKAY][0m[93m[NO][0m[92m[OKAY][0m[93m[NO][0m

  ..............transformer stochastic_transformer  ............[92m[OKAY][0m [92m[OKAY][0m 
.
[93m[NO][0m  [93m[NO][0mtransformer.......transformer    ...............................[92m[OKAY][0m  
 [93m[NO][0m[92m[OKAY][0m[93m[NO][0m 
stochastic_transformer ....... .......  [92m[OKAY][0m.
[92m[OKAY][0m 
[93m[NO][0m stochastic_transformer.......  stochastic_transformer[92m[OKAY][0m.  
.[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
----------------------------------------------------------------------------------------------------
DeepSpeed C++/CUDA extension op report

DeepSpeed C++/CUDA extension op report--------------------------------------------------

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
------------------------------------------------------------------------------------------------------------------------------------------------------

JIT compiled ops requires ninja
JIT compiled ops requires ninja

DeepSpeed C++/CUDA extension op report
--------------------------------------------------
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

DeepSpeed C++/CUDA extension op report--------------------------------------------------

JIT compiled ops requires ninja--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... [93m[NO][0m ....... [93m[NO][0masync_io
 ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... transformer_inference[92m[OKAY][0m 
.. [93m[NO][0m ....... utils[92m[OKAY][0m 
.................. [92m[YES][0m ...... utils[92m[OKAY][0m 
.................. [92m[YES][0m quantizer......  ..............[92m[OKAY][0m 
[93m[NO][0m ....... quantizer[92m[OKAY][0m 
.............. [93m[NO][0m .......-------------------------------------------------- 
[92m[OKAY][0m
--------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------


DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report
--------------------------------------------------

----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------


--------------------------------------------------DeepSpeed C++/CUDA extension op report
--------------------------------------------------
JIT compiled ops requires ninjaJIT compiled ops requires ninja

--------------------------------------------------JIT compiled ops requires ninja


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja    ...................................................... ..................   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


----------------------------------------------------------------------------------------------------

----------------------------------------------------------------------------------------------------
op nameop name
 op name ................ ................op name................    installedinstalled................installed    ....installed..  compatible  compatible..

compatible ----------------------------------------------------------------------------------------------------
compatible


--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
cpu_adamcpu_adam  ..............................cpu_adam  cpu_adam[92m[YES][0m[92m[YES][0m    ....................................  ...... [92m[YES][0m[92m[YES][0m[92m[OKAY][0m   
......[92m[OKAY][0m...... 
 [92m[OKAY][0m[92m[OKAY][0m

async_io ............... [93m[NO][0m ....... [93m[NO][0m
fused_adam fused_adam.............  fused_adam.............[93m[NO][0m   .............fused_adam[93m[NO][0m.......   [93m[NO][0m ....................  [92m[OKAY][0m....... 
 [92m[OKAY][0m[93m[NO][0m[92m[OKAY][0m
 
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
.......fused_lamb  [92m[OKAY][0mfused_lambfused_lamb.............
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
   ..........................[93m[NO][0m  fused_lamb[93m[NO][0m  [93m[NO][0m .................... .......   .......[93m[NO][0m[92m[OKAY][0m [92m[OKAY][0m 
.......
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
[92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------
sparse_attnsparse_attnsparse_attn   sparse_attn............ ........................ [93m[NO][0m  ............ [93m[NO][0m .......[93m[NO][0m  [93m[NO][0m .......[92m[OKAY][0m ....... 
....... [92m[OKAY][0m [92m[OKAY][0mtransformer[92m[OKAY][0m

 
............transformer transformertransformer [93m[NO][0m   ...........................................   [93m[NO][0m [93m[NO][0m[93m[NO][0m[92m[OKAY][0m   
.....................   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


stochastic_transformer .stochastic_transformer stochastic_transformerstochastic_transformer [93m[NO][0m   ..........    [92m[OKAY][0m[93m[NO][0m[93m[NO][0m[93m[NO][0m 
 ....... ....... ....... [92m[OKAY][0m 
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[92m[OKAY][0m[92m[OKAY][0m

async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninjaninjaninjaninja    ...................................................... ..................  [92m[OKAY][0m[92m[OKAY][0m [92m[OKAY][0m

[92m[OKAY][0m
----------------------------------------------------------------------------------------------------


----------------------------------------------------------------------------------------------------
op nameop name
  op name................................op name    ................installedinstalled  ................installed ....    installedcompatiblecompatible..

 -------------------------------------------------- ..--------------------------------------------------compatible
 

compatible
----------------------------------------------------------------------------------------------------

cpu_adam ...............cpu_adam  [92m[YES][0m...............cpu_adam  cpu_adam ......[92m[YES][0m ...............  [92m[OKAY][0m .....................[92m[YES][0m
   [92m[YES][0m[92m[OKAY][0m...... 
 ......[92m[OKAY][0m 
[92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... fused_adam[92m[OKAY][0m fused_adam
.............  fused_adam.............fused_lamb[93m[NO][0m    [93m[NO][0m.................................    .......[93m[NO][0m[93m[NO][0m[92m[OKAY][0m   
[92m[OKAY][0m.............. 
 [92m[OKAY][0mfused_lamb[92m[OKAY][0m
 fused_lamb
.............  fused_lamb[93m[NO][0m.............   ....................[93m[NO][0m   [93m[NO][0m[92m[OKAY][0m.......
  [92m[OKAY][0m.......sparse_attn
  ............[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m
sparse_attn transformer............  ............sparse_attn[93m[NO][0m   ............[93m[NO][0m.......  sparse_attn .......[93m[NO][0m [92m[OKAY][0m  ............
[92m[OKAY][0m....... 
 transformer[93m[NO][0m[92m[OKAY][0m  
stochastic_transformer...................  transformer [93m[NO][0m.[92m[OKAY][0m  .......  
[93m[NO][0m............[92m[OKAY][0m  
.......transformer[93m[NO][0m   [92m[OKAY][0m............
.......stochastic_transformer   [93m[NO][0m[92m[OKAY][0m 
........  [93m[NO][0m[92m[OKAY][0mstochastic_transformer 
.......  [92m[OKAY][0m
.stochastic_transformer  [93m[NO][0m ........  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report--------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report
--------------------------------------------------

DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------
--------------------------------------------------

JIT compiled ops requires ninja
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


JIT compiled ops requires ninja----------------------------------------------------------------------------------------------------


JIT compiled ops requires ninjaJIT compiled ops requires ninja

DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w. deepspeed wheel compiled w.......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------

op name
 op nameop nameop name ................  ................ ................................  installedinstalled installed   installed......    compatible..compatiblecompatible 


compatible------------------------------------------------------------------------------------------------------------------------------------------------------


--------------------------------------------------
cpu_adamcpu_adam cpu_adamcpu_adam  ............... .............................. ...............   [92m[YES][0m[92m[YES][0m[92m[YES][0m[92m[YES][0m    ........................   [92m[OKAY][0m[92m[OKAY][0m [92m[OKAY][0m

[92m[OKAY][0m

fused_adamfused_adam fused_adamfused_adam .............  .......................... .............  [93m[NO][0m [93m[NO][0m [93m[NO][0m[93m[NO][0m ....... ....... .......  [92m[OKAY][0m [92m[OKAY][0m.......[92m[OKAY][0m

 
[92m[OKAY][0m
fused_lambfused_lambfused_lamb  fused_lamb .......................... .............  ............. [93m[NO][0m[93m[NO][0m [93m[NO][0m  [93m[NO][0m ....... .............. .......  [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


sparse_attnsparse_attnsparse_attnsparse_attn    .................................... ............   [93m[NO][0m[93m[NO][0m[93m[NO][0m[93m[NO][0m   ....... .....................   [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m


transformer transformertransformertransformer............    ....................................[93m[NO][0m    [93m[NO][0m[93m[NO][0m[93m[NO][0m.......   ..............[92m[OKAY][0m   
.......[92m[OKAY][0m[92m[OKAY][0m 

[92m[OKAY][0m
stochastic_transformerstochastic_transformer stochastic_transformerstochastic_transformer .   .[93m[NO][0m  ..[93m[NO][0m.......    [93m[NO][0m.......[93m[NO][0m [92m[OKAY][0m  [92m[OKAY][0m
..............
  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------
DeepSpeed C++/CUDA extension op report--------------------------------------------------

--------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja


JIT compiled ops requires ninja
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninjaJIT compiled ops requires ninja

----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

------------------------------------------------------------------------------------------------------------------------------------------------------


DeepSpeed C++/CUDA extension op report--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report--------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja


--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja

--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja  .................. ..................  .................. [92m[OKAY][0m..................[92m[OKAY][0m 
 
[92m[OKAY][0m[92m[OKAY][0m--------------------------------------------------
--------------------------------------------------


----------------------------------------------------------------------------------------------------op nameop name

  op name................op name................  ................................    installedinstalledinstalled  installed .. .... ..  compatiblecompatible compatible

compatible
------------------------------------------------------------------------------------------------------------------------------------------------------


--------------------------------------------------
cpu_adamcpu_adam  ...............cpu_adam...............cpu_adam    ...............[92m[YES][0m...............[92m[YES][0m    ......[92m[YES][0m [92m[YES][0m...... [92m[OKAY][0m ...... 
...... [92m[OKAY][0m[92m[OKAY][0m 

[92m[OKAY][0m
fused_adam ............. [93m[NO][0mfused_adam  fused_adam....... ............. ............. fused_adam[92m[OKAY][0m[93m[NO][0m   
.......[93m[NO][0m.............   [92m[OKAY][0mfused_lamb.......[93m[NO][0m
  ............. [92m[OKAY][0mfused_lamb .......
[93m[NO][0m   .............[92m[OKAY][0m....... 
 [93m[NO][0mfused_lamb fused_lamb[92m[OKAY][0m  .......
..........................   [93m[NO][0m[92m[OKAY][0m[93m[NO][0m 
 ..............  [92m[OKAY][0m[92m[OKAY][0m

sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ transformer[93m[NO][0m  ...................  [93m[NO][0msparse_attnsparse_attn [92m[OKAY][0m  
...............................   transformer[92m[OKAY][0m[93m[NO][0m 
 [93m[NO][0m...................   .......stochastic_transformer[92m[OKAY][0m[93m[NO][0m  
 [92m[OKAY][0m.......
.  transformer[92m[OKAY][0m[93m[NO][0m 
transformer ............  .......stochastic_transformer[93m[NO][0m............    [93m[NO][0m[92m[OKAY][0m........  
 .......[92m[OKAY][0m[93m[NO][0m 
 [92m[OKAY][0m.......
 [92m[OKAY][0mstochastic_transformer
 stochastic_transformer . .[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
DeepSpeed general environment info:DeepSpeed general environment info:

torch install path torch install path...............  ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch version torch version....................  ....................1.8.1 
1.8.1
torch cuda version torch cuda version...............  ...............11.1 
11.1nvcc version
 nvcc version.....................  .....................11.2 
11.2deepspeed install path
 deepspeed install path...........  ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
0.4.2+bc17042, bc17042, big-sciencedeepspeed info
 ...................deepspeed wheel compiled w.  0.4.2+bc17042, bc17042, big-science......
 deepspeed wheel compiled w.torch 1.8, cuda 11.1 
DeepSpeed general environment info:
...... torch 1.8, cuda 11.1
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninja--------------------------------------------------
DeepSpeed C++/CUDA extension op report

--------------------------------------------------DeepSpeed C++/CUDA extension op report

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja--------------------------------------------------

JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja   .................................... .................. [92m[OKAY][0m ..................
 [92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m

------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------

op name op name
op name................op name    ................................installed................   ..installed installed  installed .. compatible.. 
..compatible  --------------------------------------------------
compatiblecompatible
--------------------------------------------------


----------------------------------------------------------------------------------------------------

cpu_adam ...............cpu_adam  ...............cpu_adam[92m[YES][0m  cpu_adam ...............[92m[YES][0m   ...........................[92m[YES][0m    [92m[YES][0m[92m[OKAY][0m[92m[OKAY][0m ......

...... [92m[OKAY][0m 
[92m[OKAY][0m
fused_adamfused_adam  .............fused_adam.............  [93m[NO][0m .............fused_adam  [93m[NO][0m .......[93m[NO][0m   ....................[92m[OKAY][0m ....... [92m[OKAY][0m[93m[NO][0m

  [92m[OKAY][0m.......
fused_lamb fused_lamb [92m[OKAY][0m .............
 .............fused_lamb[93m[NO][0m  fused_lamb[93m[NO][0m  .................... .............  ....... [92m[OKAY][0m[93m[NO][0m[93m[NO][0m 
 [92m[OKAY][0m .......
 .......[92m[OKAY][0m 
[92m[OKAY][0m
sparse_attn ............sparse_attn sparse_attn [93m[NO][0m ............sparse_attn  ...............................    [93m[NO][0m[92m[OKAY][0m[93m[NO][0m[93m[NO][0m
   .....................transformer    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m
............

 [93m[NO][0mtransformer transformer transformer....... ............  ............ ............[92m[OKAY][0m  [93m[NO][0m[93m[NO][0m
[93m[NO][0m  ....... ....... ....... stochastic_transformer [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m


. [93m[NO][0m stochastic_transformer.......stochastic_transformerstochastic_transformer    [92m[OKAY][0m.
..   [93m[NO][0m[93m[NO][0m[93m[NO][0m   .....................   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science
0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w.
 deepspeed wheel compiled w.......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninjaninjaninjaninja    ...................................................... ..................  [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m

--------------------------------------------------

------------------------------------------------------------------------------------------------------------------------------------------------------

op nameop name
op name   ................................op name................    installed................installedinstalled    ....installed ..  compatible ..compatible
compatible--------------------------------------------------

 
----------------------------------------------------------------------------------------------------compatible


--------------------------------------------------
cpu_adamcpu_adam cpu_adam ............... ............... ...............cpu_adam [92m[YES][0m [92m[YES][0m   ...............[92m[YES][0m............    [92m[YES][0m......[92m[OKAY][0m[92m[OKAY][0m  
[92m[OKAY][0m
......
 [92m[OKAY][0m
fused_adam fused_adam.............fused_adam  ............. .............  fused_adam[93m[NO][0m[93m[NO][0m[93m[NO][0m    ..................................  [92m[OKAY][0m  [93m[NO][0m
[92m[OKAY][0m[92m[OKAY][0m 

.......fused_lamb fused_lamb [92m[OKAY][0mfused_lamb.............  
............. .............[93m[NO][0m  fused_lamb [93m[NO][0m[93m[NO][0m ....... .............  .......  .......[92m[OKAY][0m 
[93m[NO][0m[92m[OKAY][0m[92m[OKAY][0m 

....... [92m[OKAY][0m
sparse_attnsparse_attn sparse_attn ............ ............ sparse_attn............  [93m[NO][0m [93m[NO][0m [93m[NO][0m............  ....... ....... .......  [92m[OKAY][0m[93m[NO][0m[92m[OKAY][0m[92m[OKAY][0m
 

.......transformer  transformertransformer[92m[OKAY][0m............  ............
  ............[93m[NO][0m[93m[NO][0mtransformer   ....... .......[93m[NO][0m............    .......[92m[OKAY][0m[93m[NO][0m[92m[OKAY][0m
 
 [92m[OKAY][0m.......
stochastic_transformer  [92m[OKAY][0mstochastic_transformer
. stochastic_transformer [93m[NO][0m . stochastic_transformer ........ [93m[NO][0m   [92m[OKAY][0m[93m[NO][0m........ 
  .......[92m[OKAY][0m[93m[NO][0m 
 [92m[OKAY][0m.......
 [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report--------------------------------------------------
--------------------------------------------------

DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja

----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninja--------------------------------------------------
DeepSpeed C++/CUDA extension op report

--------------------------------------------------DeepSpeed C++/CUDA extension op report

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

--------------------------------------------------JIT compiled ops requires ninja

JIT compiled ops requires ninja
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report
----------------------------------------------------------------------------------------------------
--------------------------------------------------

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
DeepSpeed C++/CUDA extension op report

DeepSpeed C++/CUDA extension op report
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
----------------------------------------------------------------------------------------------------

----------------------------------------------------------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
JIT compiled ops requires ninja--------------------------------------------------


--------------------------------------------------JIT compiled ops requires ninja

JIT compiled ops requires ninja
DeepSpeed general environment info:
DeepSpeed general environment info:torch install path
 ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 
.................... 1.8.1torch version
 ....................torch cuda version  1.8.1...............
 11.1
torch cuda versionnvcc version  ....................................  11.111.2

nvcc versiondeepspeed install path  ................................  11.2
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed install path
 deepspeed info...........  ................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']0.4.2+bc17042, bc17042, big-science

deepspeed info deepspeed wheel compiled w....................  ......0.4.2+bc17042, bc17042, big-science 
torch 1.8, cuda 11.1
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op report--------------------------------------------------

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op report


--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


JIT compiled ops requires ninja----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


JIT compiled ops requires ninjaJIT compiled ops requires ninja--------------------------------------------------


JIT compiled ops requires ninja
ninjaninjaninja   ninja......................................................  [92m[OKAY][0m  [92m[OKAY][0m[92m[OKAY][0m

..................
 ----------------------------------------------------------------------------------------------------[92m[OKAY][0m--------------------------------------------------


op name
 op name--------------------------------------------------op name................ 
................   installedop name................installed    ....................installed  compatible  
compatibleinstalled..--------------------------------------------------
  
--------------------------------------------------..compatible
 
compatible
--------------------------------------------------
--------------------------------------------------
cpu_adam cpu_adam...............  ...............[92m[YES][0m  [92m[YES][0mcpu_adam......   ......cpu_adam............... [92m[OKAY][0m[92m[OKAY][0m
  
[92m[YES][0m...............  [92m[YES][0m......  ......[92m[OKAY][0m [92m[OKAY][0mfused_adam
fused_adam
  ..........................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

fused_adam fused_adam............. fused_lambfused_lamb  .......................... [93m[NO][0m .............[93m[NO][0m    [93m[NO][0m.......[93m[NO][0m.......    [92m[OKAY][0m..............[92m[OKAY][0m
  
[92m[OKAY][0m[92m[OKAY][0m

ninjaninjaninja ninja  ....................................  ..................  ..................[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m 

[92m[OKAY][0m

fused_lamb .............fused_lamb  [93m[NO][0m.............  .......[93m[NO][0m  .......sparse_attn[92m[OKAY][0m sparse_attn [92m[OKAY][0m 
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


............
 ............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
op nameop name op nameop name ................  ................................................    installedinstalledinstalledinstalled   .. ....   ..compatiblecompatiblecompatible 

compatible
----------------------------------------------------------------------------------------------------
--------------------------------------------------


transformertransformer sparse_attn  ........................sparse_attn............   [93m[NO][0m [93m[NO][0m............[93m[NO][0m    [93m[NO][0m.....................    .......[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m 


[92m[OKAY][0m
--------------------------------------------------
transformerstochastic_transformer  stochastic_transformer............transformer  . [93m[NO][0m.  ............ [93m[NO][0m[93m[NO][0m.......   [93m[NO][0m .............. [92m[OKAY][0m  .......[92m[OKAY][0m
cpu_adamcpu_adam  ..............................cpu_adam cpu_adam  [92m[YES][0m ...............[92m[YES][0m   ...............[92m[YES][0m............   ...... [92m[YES][0m[92m[OKAY][0m  
 [92m[OKAY][0m[92m[OKAY][0m


[92m[OKAY][0m[92m[OKAY][0m......

 [92m[OKAY][0m
stochastic_transformer stochastic_transformer.  [93m[NO][0m . .......[93m[NO][0m  .......[92m[OKAY][0m 
fused_adam ............. [93m[NO][0m .......fused_adam fused_adam[92m[OKAY][0mfused_adam   
[92m[OKAY][0m
.......................... ............. [93m[NO][0m[93m[NO][0m fused_lamb   [93m[NO][0m...........................   [92m[OKAY][0m....... [92m[OKAY][0m
 [93m[NO][0m
 [92m[OKAY][0m.......
 [92m[OKAY][0mfused_lambfused_lamb
fused_lamb   .......................................   [93m[NO][0m[93m[NO][0m[93m[NO][0m  ....... ....... ....... sparse_attn[92m[OKAY][0m [92m[OKAY][0m
 [92m[OKAY][0m
............
 [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0msparse_attn sparse_attnsparse_attn .......  ............ ........................[92m[OKAY][0m   
[93m[NO][0m[93m[NO][0m[93m[NO][0m   .......stochastic_transformer .............. [92m[OKAY][0m  
[92m[OKAY][0m.[92m[OKAY][0m
 
transformer[93m[NO][0m  ............transformer transformer....... [93m[NO][0m   ............ ...................[93m[NO][0m[92m[OKAY][0m  [93m[NO][0m 
[92m[OKAY][0m....... 
 .......[92m[OKAY][0m 
[92m[OKAY][0m
stochastic_transformer stochastic_transformer .stochastic_transformer . [93m[NO][0m  [93m[NO][0m........   [92m[OKAY][0m.......[93m[NO][0m
  [92m[OKAY][0m.......
 [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... [93m[NO][0masync_io .......  ...............[93m[NO][0m 
[93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... transformer_inference[92m[OKAY][0m 
.. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
utils .................. [92m[YES][0mquantizer  ....................  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0mquantizer
 .............. [93m[NO][0m --------------------------------------------------.......
 [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io ...............  ...............[93m[NO][0m  [93m[NO][0m.......  .......[93m[NO][0m 
[93m[NO][0m
transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
DeepSpeed general environment info:deepspeed info ................... 0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w. ...... torch install pathtorch 1.8, cuda 11.1
 ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
DeepSpeed general environment info:
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
torch version .................... 1.8.1
--------------------------------------------------
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


--------------------------------------------------
--------------------------------------------------
----------------------------------------------------------------------------------------------------

op name
op name op name  op name................................................    ................installed  installedinstalled..   installed....compatible  .. 
compatible compatible--------------------------------------------------

compatible
--------------------------------------------------
--------------------------------------------------

--------------------------------------------------
cpu_adam ...............cpu_adamcpu_adam   cpu_adam[92m[YES][0m..............................   ...... ...............[92m[YES][0m [92m[YES][0m  [92m[OKAY][0m [92m[YES][0m............ 
  ......[92m[OKAY][0m[92m[OKAY][0m 

[92m[OKAY][0m
fused_adam ............. [93m[NO][0mfused_adamfused_adam  fused_adam............. ....... .............  .............[92m[OKAY][0m [93m[NO][0m 
[93m[NO][0m [93m[NO][0m ....... ....... ....... [92m[OKAY][0m fused_lamb[92m[OKAY][0m[92m[OKAY][0m
 

.............fused_lamb fused_lamb [93m[NO][0mfused_lamb   ................................. .............  [93m[NO][0m[92m[OKAY][0m 
[93m[NO][0m[93m[NO][0m   ..................... [92m[OKAY][0m  
[92m[OKAY][0m[92m[OKAY][0m

sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0msparse_attn
sparse_attn sparse_attn transformer............ ............   ............[93m[NO][0m ............[93m[NO][0m [93m[NO][0m  ....... .......[93m[NO][0m  [92m[OKAY][0m.......  
.......[92m[OKAY][0m[92m[OKAY][0m transformer

 [92m[OKAY][0m............
transformer  [93m[NO][0m............ stochastic_transformer transformer....... [93m[NO][0m .  ............ [92m[OKAY][0m.......[93m[NO][0m  
 .......[93m[NO][0m [92m[OKAY][0m [92m[OKAY][0m
.......stochastic_transformer
 stochastic_transformer [92m[OKAY][0m 
..  [93m[NO][0mstochastic_transformer[93m[NO][0m   ...............   [92m[OKAY][0m[92m[OKAY][0m[93m[NO][0m

 ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
async_io ............... [93m[NO][0m .......utils  [93m[NO][0m..................
 [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. transformer_inference[93m[NO][0m  .........  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
--------------------------------------------------
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ............... ...............[93m[NO][0m  [93m[NO][0m.......  .......[93m[NO][0m 
[93m[NO][0m
transformer_inference .. transformer_inference[93m[NO][0m  .........  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
utils .................. [92m[YES][0mutils  ........................  [92m[OKAY][0m[92m[YES][0m
 ...... [92m[OKAY][0m
quantizer .............. quantizer[93m[NO][0m  .....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
----------------------------------------------------------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op report--------------------------------------------------


DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report
JIT compiled ops requires ninja

--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


--------------------------------------------------JIT compiled ops requires ninja
--------------------------------------------------
JIT compiled ops requires ninja

JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninjaninjaninjaninja   .................................... ..................  .................. [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------

op nameop nameop name  op name  ................................................................    installedinstalled installed .. installed....    .. compatiblecompatiblecompatiblecompatible


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


cpu_adamcpu_adamcpu_adamcpu_adam   ............... ..............................  ............... [92m[YES][0m[92m[YES][0m [92m[YES][0m[92m[YES][0m    ........................  [92m[OKAY][0m  [92m[OKAY][0m
[92m[OKAY][0m[92m[OKAY][0m


fused_adamfused_adam fused_adamfused_adam .............   ..........................[93m[NO][0m  ............. [93m[NO][0m.......[93m[NO][0m    .......[92m[OKAY][0m.......[93m[NO][0m 
[92m[OKAY][0m  
.......[92m[OKAY][0m fused_lamb
[92m[OKAY][0m fused_lamb
.............  .............fused_lamb[93m[NO][0m   [93m[NO][0mfused_lamb....................   ....... .............[92m[OKAY][0m[93m[NO][0m  
 [92m[OKAY][0m[93m[NO][0m.......
  .......[92m[OKAY][0m 
[92m[OKAY][0m
sparse_attn sparse_attn............  ............[93m[NO][0msparse_attnsparse_attn    ............[93m[NO][0m...................    .......[92m[OKAY][0m[93m[NO][0m[93m[NO][0m 
  [92m[OKAY][0m..............
  transformer[92m[OKAY][0m[92m[OKAY][0m
transformer 
 ............transformer............ transformer  [93m[NO][0m  ............[93m[NO][0m ............ ....... [93m[NO][0m....... [93m[NO][0m [92m[OKAY][0m  [92m[OKAY][0m
..............
  [92m[OKAY][0m[92m[OKAY][0m

stochastic_transformerstochastic_transformer  stochastic_transformer.stochastic_transformer.    .[93m[NO][0m[93m[NO][0m   ........[93m[NO][0m.......    [93m[NO][0m[92m[OKAY][0m.......[92m[OKAY][0m 
....... 
 [92m[OKAY][0m[92m[OKAY][0m

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
utils
 .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0masync_io
 async_io............... -------------------------------------------------- ...............
[93m[NO][0m  [93m[NO][0m.......  .......[93m[NO][0m 
[93m[NO][0m
transformer_inference ..transformer_inference  [93m[NO][0m .........  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
DeepSpeed general environment info:
utils .................. utils[92m[YES][0m  ........................  [92m[OKAY][0m[92m[YES][0m
 ...... [92m[OKAY][0m
DeepSpeed general environment info:torch install path ...............
quantizer ..............quantizer  [93m[NO][0m..............  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']1.8.1

torch versiontorch cuda version  ...................................  1.8.111.1

nvcc versiontorch cuda version  ....................................  11.211.1

deepspeed install path nvcc version...........  ..................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']11.2

deepspeed infodeepspeed install path  ..............................  0.4.2+bc17042, bc17042, big-science
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed wheel compiled w.
 deepspeed info......  ...................torch 1.8, cuda 11.1 
0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

--------------------------------------------------JIT compiled ops requires ninja

JIT compiled ops requires ninja
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------


DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report

DeepSpeed C++/CUDA extension op report
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
JIT compiled ops requires ninja
JIT compiled ops requires ninja
JIT compiled ops requires ninja

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninjaninja  ....................................  [92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
--------------------------------------------------
op name op name................  installed................  ..installed compatible 
..-------------------------------------------------- 
compatible
--------------------------------------------------
cpu_adam ............... cpu_adam[92m[YES][0m  .....................  [92m[OKAY][0m[92m[YES][0m
 ...... [92m[OKAY][0m
fused_adam ............. fused_adam[93m[NO][0m  .................... [92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0mfused_lamb
 ............. [93m[NO][0m fused_lamb.......  [92m[OKAY][0m.............
 [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0msparse_attn .......  ............[92m[OKAY][0m
 [93m[NO][0m .......transformer  [92m[OKAY][0m............
 [93m[NO][0m transformer.......  ............[92m[OKAY][0m
 [93m[NO][0m ....... stochastic_transformer[92m[OKAY][0m 
. [93m[NO][0m .......stochastic_transformer [92m[OKAY][0m 
. [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


JIT compiled ops requires ninja----------------------------------------------------------------------------------------------------
DeepSpeed C++/CUDA extension op report


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja--------------------------------------------------


--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja--------------------------------------------------

JIT compiled ops requires ninja
ninjaninjaninjaninja    ........................................................................   [92m[OKAY][0m[92m[OKAY][0m [92m[OKAY][0m


[92m[OKAY][0m
------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------


op nameop nameop nameop name   ................ ................................  ................installed installed   installedinstalled....   ..compatible .. 
compatible compatible--------------------------------------------------compatible


----------------------------------------------------------------------------------------------------
--------------------------------------------------

cpu_adam ...............cpu_adam cpu_adam[92m[YES][0m cpu_adam ...............  ............... .....................[92m[YES][0m    [92m[YES][0m[92m[OKAY][0m[92m[YES][0m......
   [92m[OKAY][0m............
  [92m[OKAY][0m[92m[OKAY][0m

fused_adam fused_adam.............fused_adam  fused_adam............. [93m[NO][0m  ............. [93m[NO][0m............. .......[93m[NO][0m   ....... [93m[NO][0m[92m[OKAY][0m....... 
  .......[92m[OKAY][0m[92m[OKAY][0m 

[92m[OKAY][0mfused_lamb
 .............fused_lambfused_lamb   fused_lamb[93m[NO][0m..........................    ....................[93m[NO][0m[93m[NO][0m    [92m[OKAY][0m[93m[NO][0m.......
 ....... [92m[OKAY][0m .......
/bin/sh: line 0: type: git: not found
[92m[OKAY][0m 
[92m[OKAY][0m
sparse_attn ............sparse_attn [93m[NO][0m  sparse_attnsparse_attn...................    [93m[NO][0m............ [92m[OKAY][0m ............
.......[93m[NO][0m   [93m[NO][0m[92m[OKAY][0m.......transformer 
  ...................[92m[OKAY][0m  transformer
[92m[OKAY][0m[93m[NO][0m 
 ...................transformertransformer [92m[OKAY][0m  [93m[NO][0m 
............ ............ ....... [93m[NO][0m [93m[NO][0m [92m[OKAY][0m .......
....... stochastic_transformer [92m[OKAY][0m 
[92m[OKAY][0mstochastic_transformer
 . [93m[NO][0mstochastic_transformer.   stochastic_transformer.......[93m[NO][0m   .[92m[OKAY][0m.......
.   [93m[NO][0m[92m[OKAY][0m[93m[NO][0m 
 ..............  [92m[OKAY][0m[92m[OKAY][0m

**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninjaninjaninjaninja   ....................................  ..................[92m[OKAY][0m .................. [92m[OKAY][0m[92m[OKAY][0m
 

--------------------------------------------------[92m[OKAY][0m--------------------------------------------------
--------------------------------------------------
op name

 --------------------------------------------------................
op nameop name  op name ................installed................    installed................installed..   .. ..installed compatiblecompatible 
 
compatible..---------------------------------------------------------------------------------------------------- 


compatible--------------------------------------------------

--------------------------------------------------
cpu_adamcpu_adam  ..............................cpu_adam   cpu_adam[92m[YES][0m[92m[YES][0m...............    ...........................[92m[YES][0m    [92m[YES][0m[92m[OKAY][0m...... [92m[OKAY][0m 

......[92m[OKAY][0m 
[92m[OKAY][0m
fused_adamfused_adam  ..........................fused_adam   fused_adam[93m[NO][0m.............[93m[NO][0m   ....................  [93m[NO][0m ....... [93m[NO][0m[92m[OKAY][0m ....... 
 [92m[OKAY][0m.......[92m[OKAY][0m 

[92m[OKAY][0mfused_lamb
 .............fused_lamb fused_lamb fused_lamb[93m[NO][0m ..........................    ....................[93m[NO][0m[93m[NO][0m    [92m[OKAY][0m..............
 [93m[NO][0m [92m[OKAY][0m [92m[OKAY][0m

....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0msparse_attn  .......sparse_attn ............ sparse_attn[92m[OKAY][0m ............ 
............[93m[NO][0m   [93m[NO][0m[93m[NO][0m transformer..............    .......[92m[OKAY][0m[92m[OKAY][0m............
 
 [92m[OKAY][0mtransformer[93m[NO][0m
  transformer...................   transformer............[93m[NO][0m[92m[OKAY][0m   
.......[93m[NO][0m............   stochastic_transformer[93m[NO][0m[92m[OKAY][0m
.......   [92m[OKAY][0m........
 stochastic_transformer [92m[OKAY][0m [93m[NO][0m
 stochastic_transformer....... . stochastic_transformer .[92m[OKAY][0m [93m[NO][0m 
 [93m[NO][0m........   .......[93m[NO][0m[92m[OKAY][0m  [92m[OKAY][0m
.......
 [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m ....... transformer_inference[93m[NO][0m 
.. [93m[NO][0m ....... [92m[OKAY][0m
utilstransformer_inference  ....................  [92m[YES][0m[93m[NO][0m  .............  [92m[OKAY][0m[92m[OKAY][0m

quantizer .............. [93m[NO][0mutils  .........................  [92m[OKAY][0m[92m[YES][0m
 ...... [92m[OKAY][0m
--------------------------------------------------
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
------------------------------------------------------------------------------------------------------------------------------------------------------


DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
DeepSpeed C++/CUDA extension op report

--------------------------------------------------
--------------------------------------------------
----------------------------------------------------------------------------------------------------
JIT compiled ops requires ninja

JIT compiled ops requires ninja
JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


--------------------------------------------------
JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
torch cuda version ............... 11.1
nvcc version ..................... 11.2
async_io ............... [93m[NO][0m ....... [93m[NO][0m
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


----------------------------------------------------------------------------------------------------
--------------------------------------------------
--------------------------------------------------
op name
op name op name................   op name................................installed    ................installed..installed    installed..compatible..  
 ..compatiblecompatible-------------------------------------------------- 


compatible----------------------------------------------------------------------------------------------------


--------------------------------------------------
cpu_adam ............... cpu_adamcpu_adamcpu_adam[92m[YES][0m    ...................................................    [92m[YES][0m[92m[YES][0m[92m[YES][0m[92m[OKAY][0m   
..................   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


fused_adam ............. [93m[NO][0mfused_adam fused_adamfused_adam  ....... .......................... .............   [92m[OKAY][0m[93m[NO][0m[93m[NO][0m
[93m[NO][0m   ..............fused_lamb.......    [92m[OKAY][0m[92m[OKAY][0m.............[92m[OKAY][0m

 
[93m[NO][0m ....... fused_lambfused_lambfused_lamb [92m[OKAY][0m  .............
..........................   [93m[NO][0m[93m[NO][0m[93m[NO][0m   .....................   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


sparse_attn ............ [93m[NO][0m .......sparse_attn  [92m[OKAY][0msparse_attnsparse_attn............
   ........................[93m[NO][0m  [93m[NO][0mtransformer [93m[NO][0m  ....... ................... ....... [92m[OKAY][0m  [92m[OKAY][0m
[92m[OKAY][0m[93m[NO][0m
 
transformer.......transformer   transformer[92m[OKAY][0m........................ 
 ............ [93m[NO][0m [93m[NO][0m stochastic_transformer[93m[NO][0m .......  ....... ....... [92m[OKAY][0m .[92m[OKAY][0m
[92m[OKAY][0m 

[93m[NO][0m ....... stochastic_transformerstochastic_transformer[92m[OKAY][0m stochastic_transformer 
 .. . [93m[NO][0m [93m[NO][0m [93m[NO][0m ....... ..............   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... utils[92m[OKAY][0m 
.................. [92m[YES][0m ......utils  [92m[OKAY][0m..................
 [92m[YES][0m ...... quantizer[92m[OKAY][0m 
.............. [93m[NO][0m quantizer.......  ..............[92m[OKAY][0m 
[93m[NO][0m ....... --------------------------------------------------[92m[OKAY][0m

--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... [93m[NO][0m async_io.......  [93m[NO][0m...............
 [93m[NO][0m .......async_io [93m[NO][0m 
............... [93m[NO][0m ....... [93m[NO][0mtransformer_inference
 .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. transformer_inferenceutils[93m[NO][0m   ...........................  [93m[NO][0m[92m[YES][0m   [92m[OKAY][0m.............
  [92m[OKAY][0m[92m[OKAY][0m

utils ..................quantizer  [92m[YES][0mutils..............   ........................[93m[NO][0m   [92m[OKAY][0m[92m[YES][0m.......
  ......[92m[OKAY][0m 
[92m[OKAY][0mquantizer
 .............. [93m[NO][0m-------------------------------------------------- 
.......quantizer  [92m[OKAY][0m..............
 [93m[NO][0m .......-------------------------------------------------- 
[92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
/bin/sh: line 0: type: git: not found
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------
----------------------------------------------------------------------------------------------------


DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja


----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninjaJIT compiled ops requires ninja

DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
torch version torch version....................  ....................1.8.1 
1.8.1
torch cuda version ...............torch cuda version  11.1...............
 nvcc version11.1 
.....................nvcc version  11.2.....................
 deepspeed install path11.2 
...........deepspeed install path  ...........['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
................... deepspeed info0.4.2+bc17042, bc17042, big-science 
................... deepspeed wheel compiled w. 0.4.2+bc17042, bc17042, big-science......
 torch 1.8, cuda 11.1deepspeed wheel compiled w.
 ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
--------------------------------------------------
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
----------------------------------------------------------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op report


DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report--------------------------------------------------


----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

--------------------------------------------------
--------------------------------------------------
JIT compiled ops requires ninja
JIT compiled ops requires ninja

JIT compiled ops requires ninja
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m


--------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------

op name
op name op nameop name  ................ ................................ ................   installedinstalledinstalledinstalled    ........  compatible  compatible
compatible
compatible
----------------------------------------------------------------------------------------------------
--------------------------------------------------

--------------------------------------------------

cpu_adamcpu_adam  cpu_adam..............................cpu_adam    [92m[YES][0m...............[92m[YES][0m...............    ......[92m[YES][0m...... [92m[YES][0m  [92m[OKAY][0m [92m[OKAY][0m............
 
 [92m[OKAY][0m[92m[OKAY][0m

fused_adamfused_adam  ..........................  fused_adamfused_adam[93m[NO][0m [93m[NO][0m   .................... ............. ....... [93m[NO][0m[92m[OKAY][0m [93m[NO][0m  
.......[92m[OKAY][0m....... 
 [92m[OKAY][0m[92m[OKAY][0mfused_lamb

 fused_lamb.............  .............fused_lamb[93m[NO][0m  fused_lamb....... [93m[NO][0m  ............. .............[92m[OKAY][0m.......   
[93m[NO][0m[93m[NO][0m[92m[OKAY][0m  
..............  [92m[OKAY][0m[92m[OKAY][0m

sparse_attn ............ [93m[NO][0msparse_attn  ................... sparse_attnsparse_attn[92m[OKAY][0m   
[93m[NO][0m........................   transformer.......[93m[NO][0m[93m[NO][0m  ............  [92m[OKAY][0m.......
  .......[93m[NO][0mtransformer[92m[OKAY][0m   
.......[92m[OKAY][0m............  [92m[OKAY][0mtransformer
[93m[NO][0m
  .......transformer............ stochastic_transformer [93m[NO][0m [92m[OKAY][0m  ............
........   [93m[NO][0m[93m[NO][0m[92m[OKAY][0m  
..............stochastic_transformer   [92m[OKAY][0m[92m[OKAY][0mstochastic_transformer.

  [93m[NO][0m ........stochastic_transformer  [93m[NO][0m [92m[OKAY][0m 
........  [93m[NO][0m[92m[OKAY][0m 
ninjaninjaninjaninja   .................. .................. ....................................  [92m[OKAY][0m[92m[OKAY][0m [92m[OKAY][0m

[92m[OKAY][0m
--------------------------------------------------
....... [92m[OKAY][0m

------------------------------------------------------------------------------------------------------------------------------------------------------
op name

 op nameop name................op name    installed................................................    installedinstalled..installed   .. ....compatible   
compatiblecompatible--------------------------------------------------compatible


------------------------------------------------------------------------------------------------------------------------------------------------------


cpu_adam cpu_adam...............cpu_adam   cpu_adam...............[92m[YES][0m  ..............................[92m[YES][0m    ......[92m[YES][0m......[92m[YES][0m    ......[92m[OKAY][0m...... [92m[OKAY][0m 
[92m[OKAY][0m
[92m[OKAY][0m

fused_adam fused_adam............. fused_adam.............fused_adam    .............[93m[NO][0m [93m[NO][0m............. ....... [93m[NO][0m  ....... .......[92m[OKAY][0m[93m[NO][0m  
 [92m[OKAY][0m[92m[OKAY][0m.......

 fused_lamb[92m[OKAY][0m 
fused_lamb.............fused_lamb   .............fused_lamb.............[93m[NO][0m    [93m[NO][0m[93m[NO][0m....................    ..............[93m[NO][0m [92m[OKAY][0m  [92m[OKAY][0m
.......
[92m[OKAY][0m [92m[OKAY][0m

sparse_attn ............ [93m[NO][0msparse_attn  .......sparse_attnsparse_attn ............   ............[92m[OKAY][0m............  [93m[NO][0m[93m[NO][0m
[93m[NO][0m   ..............transformer.......    [92m[OKAY][0m............[92m[OKAY][0m[92m[OKAY][0m
 

[93m[NO][0m transformertransformer ....... ............transformer............   [92m[OKAY][0m [93m[NO][0m............ 
[93m[NO][0m.......   [93m[NO][0m[92m[OKAY][0m.......stochastic_transformer 
  ....... .[92m[OKAY][0m[92m[OKAY][0m 

[93m[NO][0mstochastic_transformer  ....... .[92m[OKAY][0mstochastic_transformer stochastic_transformer
 [93m[NO][0m  .........   [92m[OKAY][0m[93m[NO][0m[93m[NO][0m
  ..............  [92m[OKAY][0m[92m[OKAY][0m

DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

DeepSpeed general environment info:DeepSpeed general environment info:

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

torch install pathtorch install path ...............  ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

DeepSpeed general environment info:
deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
DeepSpeed general environment info:
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io async_io...............  [93m[NO][0m...............  .......[93m[NO][0m  [93m[NO][0m.......
 [93m[NO][0m
transformer_inference .. [93m[NO][0mtransformer_inference  .........  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
utils .................. [92m[YES][0mutils  ........................  [92m[OKAY][0m[92m[YES][0m
 ...... [92m[OKAY][0m
quantizer ..............quantizer  [93m[NO][0m..............  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m

--------------------------------------------------
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
--------------------------------------------------
----------------------------------------------------------------------------------------------------
op name

op name  op name................op name................    installed................installed ................  ..installed..    compatiblecompatibleinstalled..

  --------------------------------------------------..--------------------------------------------------compatible
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1

 
compatible--------------------------------------------------

--------------------------------------------------
DeepSpeed general environment info:
cpu_adam cpu_adam...............  ...............[92m[YES][0m cpu_adam cpu_adam[92m[YES][0m......   ..............................  ......  [92m[OKAY][0m[92m[YES][0m[92m[OKAY][0m[92m[YES][0m

  ............  [92m[OKAY][0m[92m[OKAY][0m

torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
fused_adamfused_adam  ..........................  fused_adam[93m[NO][0mfused_adam [93m[NO][0m.......  [92m[OKAY][0m  .............
torch version .................... 1.8.1
....................   fused_lamb[92m[OKAY][0m [93m[NO][0m[93m[NO][0m
torch cuda version ............... 11.1
 ............. ....... .......fused_lamb [93m[NO][0m   [92m[OKAY][0m[92m[OKAY][0m....................

  [92m[OKAY][0m[93m[NO][0mfused_lamb
nvcc version ..................... 11.2
  fused_lamb....................   .............[93m[NO][0m[92m[OKAY][0m  
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
[93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0msparse_attn
 ............ [93m[NO][0m ....... [92m[OKAY][0m
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
sparse_attn transformer............  sparse_attn............[93m[NO][0msparse_attn    ............[93m[NO][0m.......  ............[93m[NO][0m  [92m[OKAY][0m.......  [93m[NO][0m
....... [92m[OKAY][0m 
transformer[92m[OKAY][0m....... 
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
 ............[92m[OKAY][0mstochastic_transformertransformer 
  [93m[NO][0m. ...................transformer    ............[93m[NO][0m [93m[NO][0m[92m[OKAY][0m [93m[NO][0m 
.......  ..............[92m[OKAY][0m 
 stochastic_transformer[92m[OKAY][0m[92m[OKAY][0m 

. [93m[NO][0mstochastic_transformer  stochastic_transformer.......  .[92m[OKAY][0m.
  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------
--------------------------------------------------

transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference utils..  ..................[93m[NO][0m  [92m[YES][0m.......  ......[92m[OKAY][0m 
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report--------------------------------------------------
--------------------------------------------------


--------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------
--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

[92m[OKAY][0m
quantizer .............. utils[93m[NO][0m  .........................  [92m[YES][0m[92m[OKAY][0m 
--------------------------------------------------JIT compiled ops requires ninja


--------------------------------------------------JIT compiled ops requires ninja

...... [92m[OKAY][0m
--------------------------------------------------
JIT compiled ops requires ninja
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
ninjaninjaninjaninja   .................................... ..................  ..................[92m[OKAY][0m[92m[OKAY][0m  

[92m[OKAY][0m[92m[OKAY][0m--------------------------------------------------
--------------------------------------------------


----------------------------------------------------------------------------------------------------op name

 op name................op nameop name    ................installed................................   installed ..installed  installed  ..compatible.... 
  compatible--------------------------------------------------compatiblecompatible


------------------------------------------------------------------------------------------------------------------------------------------------------


cpu_adam ............... [92m[YES][0mcpu_adam cpu_adamcpu_adam......    ...............[92m[OKAY][0m............... ...............
 [92m[YES][0m [92m[YES][0m [92m[YES][0m ...... ...... ...... [92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m

fused_adam ............. [93m[NO][0m ....... fused_adam[92m[OKAY][0mfused_adam
 fused_adam ..........................   [93m[NO][0mfused_lamb.............[93m[NO][0m   .......  .............[93m[NO][0m.......[92m[OKAY][0m  
[93m[NO][0m[92m[OKAY][0m  
.......fused_lamb.......   [92m[OKAY][0m.............[92m[OKAY][0mfused_lamb
 
 [93m[NO][0m.............  [93m[NO][0mfused_lamb.......   ....................[92m[OKAY][0m  
[92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
sparse_attnsparse_attn  transformer........................  ............ sparse_attn[93m[NO][0m [93m[NO][0m  [93m[NO][0m ...................  ....... ....... [93m[NO][0m[92m[OKAY][0m[92m[OKAY][0m  

.......[92m[OKAY][0m 
transformertransformer[92m[OKAY][0m  
........................stochastic_transformer  transformer  [93m[NO][0m[93m[NO][0m.............    ..............[93m[NO][0m[93m[NO][0m    [92m[OKAY][0m.......[92m[OKAY][0m.......
 
 [92m[OKAY][0m[92m[OKAY][0m
stochastic_transformer
stochastic_transformer  .. stochastic_transformer [93m[NO][0m [93m[NO][0m  .............. . [92m[OKAY][0m [92m[OKAY][0m
[93m[NO][0m
 ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report
--------------------------------------------------
--------------------------------------------------

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report
--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
----------------------------------------------------------------------------------------------------
DeepSpeed C++/CUDA extension op report


--------------------------------------------------JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------


JIT compiled ops requires ninja--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


JIT compiled ops requires ninja--------------------------------------------------

JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
DeepSpeed general environment info:
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
async_io ............... [93m[NO][0m ....... [93m[NO][0m
torch version .................... 1.8.1
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
torch cuda version ............... 11.1
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

------------------------------------------------------------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
DeepSpeed C++/CUDA extension op report

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------


JIT compiled ops requires ninja----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


--------------------------------------------------DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja


JIT compiled ops requires ninja
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... transformer_inference[93m[NO][0m  .........  [93m[NO][0m[93m[NO][0m 
....... [92m[OKAY][0m
utils .................. [92m[YES][0mtransformer_inference  ........  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0mquantizer
 .............. [93m[NO][0m ....... [92m[OKAY][0mutils
 .................. [92m[YES][0m-------------------------------------------------- 
...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
ninjaninjaninjaninja   ......................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m

..................
-------------------------------------------------- ----------------------------------------------------------------------------------------------------

[92m[OKAY][0mop name
op name
 op name ................ ................--------------------------------------------------................ installed   installedinstalled
 .. .. ..op name  compatiblecompatible
 compatible
----------------------------------------------------------------------------------------------------................


 --------------------------------------------------installed
 .. compatible
cpu_adam-------------------------------------------------- cpu_adam
...............cpu_adam   ...............[92m[YES][0m...............   [92m[YES][0m......[92m[YES][0m   ......[92m[OKAY][0m...... 
 [92m[OKAY][0mcpu_adam[92m[OKAY][0m
 
............... [92m[YES][0m ...... [92m[OKAY][0mfused_adam
 ............. fused_adam[93m[NO][0m fused_adam ............. ....... ............. [93m[NO][0m [92m[OKAY][0m [93m[NO][0m
.......  .......fused_adam[92m[OKAY][0mfused_lamb 
  [92m[OKAY][0m..........................
fused_lamb   [93m[NO][0mfused_lamb[93m[NO][0m ............. .......   ....................[93m[NO][0m[92m[OKAY][0m  [93m[NO][0m 
 [92m[OKAY][0m..............
  [92m[OKAY][0m[92m[OKAY][0m

fused_lamb ............. [93m[NO][0m ....... sparse_attn[92m[OKAY][0m 
............ [93m[NO][0m ....... sparse_attnsparse_attn[92m[OKAY][0m  
........................  [93m[NO][0mtransformer[93m[NO][0m   ..........................  sparse_attn [92m[OKAY][0m[92m[OKAY][0m
 [93m[NO][0m
 transformer...................   ............transformer[92m[OKAY][0m[93m[NO][0m
   [93m[NO][0m...................  stochastic_transformer .......[93m[NO][0m[92m[OKAY][0m   
[92m[OKAY][0m........
  [92m[OKAY][0m[93m[NO][0mtransformer
  stochastic_transformer................... stochastic_transformer   [92m[OKAY][0m.[93m[NO][0m
 . [93m[NO][0m  [93m[NO][0m..............  ....... [92m[OKAY][0m[92m[OKAY][0m 
[92m[OKAY][0m

stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m

[92m[OKAY][0m
----------------------------------------------------------------------------------------------------


----------------------------------------------------------------------------------------------------op nameop name
 
 ................op name................op name   installed ................installed ................ ..  ..  installedinstalledcompatible compatible
 ..
..-------------------------------------------------- --------------------------------------------------
 
compatiblecompatible

----------------------------------------------------------------------------------------------------

cpu_adam ...............cpu_adam  [92m[YES][0m...............  cpu_adamcpu_adam......[92m[YES][0m    ....................................[92m[OKAY][0m   
[92m[YES][0m[92m[OKAY][0m[92m[YES][0m
  ............  [92m[OKAY][0m[92m[OKAY][0m

fused_adam ............. [93m[NO][0m fused_adam.......  .............[92m[OKAY][0m 
[93m[NO][0mfused_adamfused_adam   fused_lamb....................   [92m[OKAY][0m.............[93m[NO][0m
 .............  [93m[NO][0mfused_lamb.......[93m[NO][0m    ........................... [92m[OKAY][0m [93m[NO][0m [92m[OKAY][0m 
[92m[OKAY][0m
.......
 [92m[OKAY][0mfused_lamb
 fused_lamb.............  .............[93m[NO][0m  [93m[NO][0m.......  .......sparse_attn[92m[OKAY][0m  
[92m[OKAY][0m............
 sparse_attn[93m[NO][0m  ...................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
transformersparse_attn transformer............sparse_attn   [93m[NO][0m ........................ ............  .......[93m[NO][0m[93m[NO][0m    .......[92m[OKAY][0m[93m[NO][0m
.......   [92m[OKAY][0m.......
[92m[OKAY][0mstochastic_transformer 
 [92m[OKAY][0mstochastic_transformer
 .transformer . [93m[NO][0mtransformer ............  [93m[NO][0m ...................  [93m[NO][0m....... [92m[OKAY][0m [93m[NO][0m [92m[OKAY][0m
 
..............  [92m[OKAY][0m[92m[OKAY][0m

stochastic_transformerstochastic_transformer  .. [93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m [92m[OKAY][0m

DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed info deepspeed info...................  ...................0.4.2+bc17042, bc17042, big-science 
0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. deepspeed wheel compiled w.......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


DeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report


DeepSpeed C++/CUDA extension op report----------------------------------------------------------------------------------------------------

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------

JIT compiled ops requires ninja
--------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja


JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
JIT compiled ops requires ninja--------------------------------------------------

DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report

--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------
--------------------------------------------------

async_io ............... [93m[NO][0m ....... [93m[NO][0m
DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja

--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
op nameop nameop name  op name ................ ................................ ................  installed installedinstalled installed  .. .... ..  compatible compatible
compatiblecompatible

--------------------------------------------------
--------------------------------------------------
----------------------------------------------------------------------------------------------------


cpu_adamcpu_adam cpu_adamcpu_adam...............    .............................................[92m[YES][0m    [92m[YES][0m......[92m[YES][0m[92m[YES][0m    ......[92m[OKAY][0m............ 
 [92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m

fused_adam .............fused_adam fused_adam fused_adam[93m[NO][0m ............. .............  ............. .......[93m[NO][0m [93m[NO][0m  [93m[NO][0m [92m[OKAY][0m ..............
.......   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m

fused_lamb
 .............fused_lamb fused_lambfused_lamb [93m[NO][0m  ............. .......................... .......  [93m[NO][0m [93m[NO][0m[93m[NO][0m [92m[OKAY][0m  .......
..............   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


sparse_attn ............ [93m[NO][0msparse_attn sparse_attnsparse_attn.......    ........................[92m[OKAY][0m............  
DeepSpeed general environment info:
 [93m[NO][0m[93m[NO][0m[93m[NO][0m   .....................transformer    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m............


torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
 [93m[NO][0m transformer.......transformer transformer ............  [92m[OKAY][0m ........................
[93m[NO][0m   [93m[NO][0m[93m[NO][0m.......   ..............stochastic_transformer[92m[OKAY][0m   
torch version .................... 1.8.1
[92m[OKAY][0m[92m[OKAY][0m

. [93m[NO][0m .......stochastic_transformerstochastic_transformerstochastic_transformer    [92m[OKAY][0m
torch cuda version ............... 11.1
...   [93m[NO][0m[93m[NO][0m [93m[NO][0m ..............   .......[92m[OKAY][0m[92m[OKAY][0m 

[92m[OKAY][0m
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninjaninjaninjaninja    .................................... .................................... [92m[OKAY][0m  [92m[OKAY][0m
[92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
--------------------------------------------------

----------------------------------------------------------------------------------------------------op nameop name
  
op nameop name................................   ................ installed ................ installed ..installedinstalled  ..  .. ..compatible compatible
compatible 

------------------------------------------------------------------------------------------------------------------------------------------------------compatible


--------------------------------------------------
cpu_adamcpu_adam cpu_adam ...............cpu_adam  ............... [92m[YES][0m..............................   ...... [92m[YES][0m [92m[YES][0m[92m[OKAY][0m[92m[YES][0m 
  ..................  [92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m

fused_adam ............. [93m[NO][0m ....... fused_adam[92m[OKAY][0mfused_adamfused_adam 
 ............. ............. ............. fused_lamb[93m[NO][0m  .............[93m[NO][0m  [93m[NO][0m[93m[NO][0m .......  ....... .......[92m[OKAY][0m 
 ....... [92m[OKAY][0m[92m[OKAY][0mfused_lamb
[92m[OKAY][0m 

............. [93m[NO][0m .......fused_lambfused_lamb   [92m[OKAY][0m.......................... 
 sparse_attn[93m[NO][0m[93m[NO][0m   ...................  .......[93m[NO][0m[92m[OKAY][0m  
.......[92m[OKAY][0msparse_attn 
[92m[OKAY][0m 
............ [93m[NO][0m transformer.......  ............sparse_attn[92m[OKAY][0m  
[93m[NO][0m............sparse_attn  transformer....... [93m[NO][0m  ............[92m[OKAY][0m 
............ ....... [93m[NO][0m stochastic_transformer[93m[NO][0m [92m[OKAY][0m  .......
........   transformer[92m[OKAY][0m[93m[NO][0m[92m[OKAY][0m  .......
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
 
[92m[OKAY][0m............
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer [93m[NO][0m  stochastic_transformer...................   [93m[NO][0m.[92m[OKAY][0m  
.......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer stochastic_transformer . .[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------
DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja

----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

--------------------------------------------------
--------------------------------------------------

DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja--------------------------------------------------
--------------------------------------------------

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------
JIT compiled ops requires ninja
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
DeepSpeed general environment info:
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
torch cuda version ............... 11.1
nvcc version ..................... 11.2
--------------------------------------------------
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
----------------------------------------------------------------------------------------------------

--------------------------------------------------DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report


--------------------------------------------------DeepSpeed C++/CUDA extension op report--------------------------------------------------
DeepSpeed C++/CUDA extension op report

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------
--------------------------------------------------

--------------------------------------------------
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


JIT compiled ops requires ninjaJIT compiled ops requires ninja----------------------------------------------------------------------------------------------------


JIT compiled ops requires ninjaJIT compiled ops requires ninja

--------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op report--------------------------------------------------


--------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report--------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.----------------------------------------------------------------------------------------------------
DeepSpeed C++/CUDA extension op report

--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------


--------------------------------------------------JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

JIT compiled ops requires ninja


--------------------------------------------------JIT compiled ops requires ninja

JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------


DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report


------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

DeepSpeed C++/CUDA extension op report
------------------------------------------------------------------------------------------------------------------------------------------------------


--------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja
JIT compiled ops requires ninja
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninjaninjaninjaninja  ..................   ......................................................[92m[OKAY][0m   
[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m--------------------------------------------------


--------------------------------------------------op name---------------------------------------------------------------------------------------------------- 


................op name  op nameop nameinstalled................    installed..................  ................ ..compatible 
installed  --------------------------------------------------compatible..installed
 
 compatible--------------------------------------------------..

 --------------------------------------------------compatible
DeepSpeed general environment info:
cpu_adam
 ...............-------------------------------------------------- cpu_adam[92m[YES][0m
  cpu_adam.....................   ...............[92m[YES][0m[92m[OKAY][0m 
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
 [92m[YES][0m......  ......[92m[OKAY][0mcpu_adam
 ............... [92m[OKAY][0m 
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
fused_adam[92m[YES][0m  ............. ......[93m[NO][0m fused_adam [92m[OKAY][0m fused_adam.......
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
.............   .............[93m[NO][0m[92m[OKAY][0m  
[93m[NO][0m.......  .......[92m[OKAY][0mfused_lamb 
 [92m[OKAY][0m.............
 fused_adam[93m[NO][0mfused_lambfused_lamb  .......  .......................................    [92m[OKAY][0m[93m[NO][0m[93m[NO][0m[93m[NO][0m
   .....................  [92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m

sparse_attn fused_lamb............  .............[93m[NO][0m  [93m[NO][0m.......  sparse_attnsparse_attn .......[92m[OKAY][0m ............
 ............  transformer[93m[NO][0m[93m[NO][0m[92m[OKAY][0m   ............
..............   [92m[OKAY][0m[93m[NO][0m[92m[OKAY][0m
 
....... [92m[OKAY][0mtransformertransformer
  ........................  [93m[NO][0m[93m[NO][0mstochastic_transformer   .......sparse_attn....... . [92m[OKAY][0m  ............
[92m[OKAY][0m[93m[NO][0m  
.......[93m[NO][0mstochastic_transformer [92m[OKAY][0mstochastic_transformer   
........ . [92m[OKAY][0m [93m[NO][0m
[93m[NO][0m  .............. transformer [92m[OKAY][0m [92m[OKAY][0m

............ [93m[NO][0m ....... [92m[OKAY][0m
ninjaninjaninjaninja    ...................................................... ..................  [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m

stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m

------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------
op name
op name op name  op name................................................    ................installedinstalledinstalled    installed......    compatible..compatiblecompatible
 

compatible----------------------------------------------------------------------------------------------------
--------------------------------------------------


--------------------------------------------------
cpu_adamcpu_adamcpu_adam   cpu_adam.............................................    ...............[92m[YES][0m[92m[YES][0m[92m[YES][0m    ......[92m[YES][0m............    ......[92m[OKAY][0m[92m[OKAY][0m [92m[OKAY][0m

[92m[OKAY][0m

fused_adamfused_adam fused_adamfused_adam ............. .............   [93m[NO][0m.............[93m[NO][0m.............    .......[93m[NO][0m[93m[NO][0m.......    [92m[OKAY][0m.......[92m[OKAY][0m
 .......
[92m[OKAY][0m 
[92m[OKAY][0mfused_lamb
fused_lamb  .............fused_lamb.............   [93m[NO][0mfused_lamb[93m[NO][0m.............    ...........................[93m[NO][0m    [93m[NO][0m[92m[OKAY][0m[92m[OKAY][0m....... 

 .......[92m[OKAY][0m 
[92m[OKAY][0m
sparse_attnsparse_attn  ........................sparse_attn   sparse_attn[93m[NO][0m[93m[NO][0m............    ...................[93m[NO][0m.......    [92m[OKAY][0m[93m[NO][0m[92m[OKAY][0m....... 

 .......[92m[OKAY][0m 
transformertransformer[92m[OKAY][0m  
........................transformer   [93m[NO][0mtransformer[93m[NO][0m............    ...................[93m[NO][0m.......    [93m[NO][0m[92m[OKAY][0m.......[92m[OKAY][0m 
 
.......[92m[OKAY][0m 
[92m[OKAY][0m
stochastic_transformerstochastic_transformer  stochastic_transformerstochastic_transformer..    [93m[NO][0m[93m[NO][0m.   ...............[93m[NO][0m    [93m[NO][0m[92m[OKAY][0m[92m[OKAY][0m....... 

 .......[92m[OKAY][0m 
[92m[OKAY][0m
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m


------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------

op name 
op nameop name................ op name................   installed ................................ installed ..  ..installed installed compatible  compatible
....
--------------------------------------------------  
--------------------------------------------------compatiblecompatible


----------------------------------------------------------------------------------------------------

cpu_adam ...............cpu_adam cpu_adam [92m[YES][0mcpu_adam ...............  ............... .....................[92m[YES][0m    [92m[YES][0m[92m[OKAY][0m......[92m[YES][0m
   [92m[OKAY][0m............
  [92m[OKAY][0m[92m[OKAY][0m

fused_adam ............. fused_adam[93m[NO][0mfused_adam  fused_adam .................................   [93m[NO][0m[93m[NO][0m  [92m[OKAY][0m .......
............. ....... [92m[OKAY][0m fused_lamb
[93m[NO][0m[92m[OKAY][0m  
............. fused_lamb.......[93m[NO][0m fused_lamb .............  .......[92m[OKAY][0m ............. [93m[NO][0m
 [92m[OKAY][0m 
[93m[NO][0mfused_lamb.......   ....................[92m[OKAY][0m [93m[NO][0m
  .......[92m[OKAY][0m 
[92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformersparse_attnsparse_attn  sparse_attn............ ............   ............[93m[NO][0m............[93m[NO][0m    ..............[93m[NO][0m [93m[NO][0m   [92m[OKAY][0m[92m[OKAY][0m..............

  [92m[OKAY][0m[92m[OKAY][0m

transformer stochastic_transformer............ transformertransformer [93m[NO][0m .  ............ ...................[93m[NO][0m    [92m[OKAY][0m.......[93m[NO][0m[93m[NO][0m  
[92m[OKAY][0m .......
.......  stochastic_transformer[92m[OKAY][0m[92m[OKAY][0m 

. [93m[NO][0m .......stochastic_transformerstochastic_transformer   [92m[OKAY][0m
..  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

ninjaninjaninjaninja   ....................................  ....................................  [92m[OKAY][0m[92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


op nameop nameop name op name ................  ................ ................................  installedinstalled  installed ....installed   .. compatiblecompatible ..
 
compatible--------------------------------------------------compatible--------------------------------------------------


----------------------------------------------------------------------------------------------------

cpu_adamcpu_adam  cpu_adam...............cpu_adam...............  ...............[92m[YES][0m  [92m[YES][0m  ...............[92m[YES][0m......    ......[92m[OKAY][0m......[92m[YES][0m
   [92m[OKAY][0m[92m[OKAY][0m......

 [92m[OKAY][0m
fused_adam .............fused_adam fused_adam[93m[NO][0mfused_adam   ............. .......................... [93m[NO][0m.......    [93m[NO][0m[92m[OKAY][0m[93m[NO][0m
.......   ..............fused_lamb[92m[OKAY][0m  
 .............[92m[OKAY][0m[92m[OKAY][0m 

fused_lamb[93m[NO][0m  fused_lamb.............fused_lamb.......   [93m[NO][0m  [92m[OKAY][0m..........................
.......   [93m[NO][0m[93m[NO][0m[92m[OKAY][0m  
..............  [92m[OKAY][0m[92m[OKAY][0m

sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0msparse_attn
 ............ sparse_attnsparse_attn [93m[NO][0mtransformer   ...........................................   [93m[NO][0m  [93m[NO][0m[92m[OKAY][0m[93m[NO][0m 
..............   transformer[92m[OKAY][0m.......[92m[OKAY][0m 

 ............[92m[OKAY][0m transformerstochastic_transformer
 [93m[NO][0m  .............transformer.......   [93m[NO][0m [93m[NO][0m[92m[OKAY][0m ............
 .......  [93m[NO][0m.......stochastic_transformer[92m[OKAY][0m  
 .......[92m[OKAY][0m 
.[92m[OKAY][0m 
[93m[NO][0mstochastic_transformer  .......stochastic_transformer . [92m[OKAY][0m .
[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path ...............  ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

------------------------------------------------------------------------------------------------------------------------------------------------------
DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja


DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report--------------------------------------------------


--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
JIT compiled ops requires ninja

JIT compiled ops requires ninja
ninjaninjaninjaninja   ......................................................    ..................[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m 


[92m[OKAY][0m----------------------------------------------------------------------------------------------------

--------------------------------------------------
op name
--------------------------------------------------op name 
op name................   op name................................installed    ..................installedinstalled   installed..  compatible..  compatible
DeepSpeed general environment info:
compatible
..--------------------------------------------------
 --------------------------------------------------
--------------------------------------------------
compatible
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

--------------------------------------------------
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
cpu_adamcpu_adamcpu_adam cpu_adam  ............... .............................. ...............  [92m[YES][0m[92m[YES][0m [92m[YES][0m [92m[YES][0m ............    [92m[OKAY][0m[92m[OKAY][0m...... ......
[92m[OKAY][0m 

[92m[OKAY][0m
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
fused_adam fused_adam.............fused_adam   fused_adam[93m[NO][0m.............  .................... ............. [93m[NO][0m[93m[NO][0m   [93m[NO][0m [92m[OKAY][0m .....................
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0mfused_lamb


deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
 ............. fused_lambfused_lambfused_lamb[93m[NO][0m    ..............................................   [93m[NO][0m  [93m[NO][0m[92m[OKAY][0m[93m[NO][0m.......
   .......[92m[OKAY][0m....... 
[92m[OKAY][0m 
[92m[OKAY][0m
/bin/sh: line 0: type: git: not found
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn sparse_attnsparse_attn............transformer    ............[93m[NO][0m  ............[93m[NO][0m...................    [93m[NO][0m[93m[NO][0m[92m[OKAY][0m.......  
 ..............[92m[OKAY][0m  transformer
 [92m[OKAY][0m[92m[OKAY][0m............

/bin/sh: line 0: type: git: not found
 [93m[NO][0mtransformer  transformerstochastic_transformer...................    [92m[OKAY][0m............[93m[NO][0m
.   [93m[NO][0m....... [93m[NO][0mstochastic_transformer.......    [92m[OKAY][0m.......[92m[OKAY][0m
. 
[92m[OKAY][0m 
[93m[NO][0mstochastic_transformer  stochastic_transformer.......  .[92m[OKAY][0m .
[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op report--------------------------------------------------


--------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report
--------------------------------------------------

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------


DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

----------------------------------------------------------------------------------------------------
JIT compiled ops requires ninja

--------------------------------------------------
JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


JIT compiled ops requires ninja
--------------------------------------------------
JIT compiled ops requires ninja
----------------------------------------------------------------------------------------------------

--------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report


----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

--------------------------------------------------
----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


JIT compiled ops requires ninjaJIT compiled ops requires ninja--------------------------------------------------

DeepSpeed C++/CUDA extension op report
JIT compiled ops requires ninja

--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
------------------------------------------------------------------------------------------------------------------------------------------------------


DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report--------------------------------------------------


------------------------------------------------------------------------------------------------------------------------------------------------------
DeepSpeed C++/CUDA extension op report
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
--------------------------------------------------
----------------------------------------------------------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja
JIT compiled ops requires ninja--------------------------------------------------JIT compiled ops requires ninja


JIT compiled ops requires ninja

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op report--------------------------------------------------
DeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------
DeepSpeed C++/CUDA extension op report
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


------------------------------------------------------------------------------------------------------------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninjaJIT compiled ops requires ninja


--------------------------------------------------
JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
ninjaninjaninjaninja    ........................................................................ [92m[OKAY][0m  
 [92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m--------------------------------------------------


----------------------------------------------------------------------------------------------------
op name--------------------------------------------------
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
ninjaninjaninja  ninja .................................... ..................  .................. [92m[OKAY][0m[92m[OKAY][0m 
[92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------

**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
torch version .................... 1.8.1
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
op name
 op nameop name................ op name ................  ................installed ................   installedinstalledinstalled .. .. ..  .. compatiblecompatible compatible

compatible
----------------------------------------------------------------------------------------------------
--------------------------------------------------

**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
torch cuda version ............... 11.1
/bin/sh: line 0: type: git: not found
op name op name op name................ ................ ................  installed ................installed  installed .. ..installed .. compatible  ..compatible
compatible --------------------------------------------------
compatible
--------------------------------------------------

--------------------------------------------------

--------------------------------------------------
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

--------------------------------------------------
cpu_adam cpu_adamcpu_adam ............... ...............cpu_adam ............... [92m[YES][0m   [92m[YES][0m[92m[YES][0m.....................    ......[92m[YES][0m......[92m[OKAY][0m  
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
cpu_adam cpu_adam...............  cpu_adam[92m[YES][0m............... cpu_adam  ...............[92m[YES][0m......  ...............   [92m[OKAY][0m[92m[YES][0m......[92m[YES][0m 
  ......[92m[OKAY][0m...... 
 [92m[OKAY][0m......
[92m[OKAY][0m 
[92m[OKAY][0m
 [92m[OKAY][0m
[92m[OKAY][0m
fused_adam ............. fused_adam[93m[NO][0m  fused_adam....................   fused_adam.............[92m[OKAY][0m[93m[NO][0m   .............
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
fused_adam ............. [93m[NO][0m fused_adam.......  .............[92m[OKAY][0mfused_adam fused_adam
[93m[NO][0m.......   [93m[NO][0m.......fused_lamb [92m[OKAY][0m  .............
.......[92m[OKAY][0m  
[93m[NO][0m[92m[OKAY][0mfused_lamb 
 .............[93m[NO][0m   .............fused_lamb....... [93m[NO][0m  [93m[NO][0m [92m[OKAY][0m............. .......
.......  .............[92m[OKAY][0m fused_lamb[93m[NO][0m
 .......[93m[NO][0m fused_lamb   [92m[OKAY][0m[92m[OKAY][0m.......

 .............[92m[OKAY][0m 
fused_lamb   .................................   [93m[NO][0m[92m[OKAY][0m[93m[NO][0m 
.......  .......[92m[OKAY][0m
 [92m[OKAY][0m
[93m[NO][0mfused_lamb  fused_lamb....................   .............[92m[OKAY][0m[93m[NO][0m 
 [93m[NO][0m.......sparse_attn   .......[92m[OKAY][0m............ 
sparse_attn ............ [93m[NO][0m sparse_attn.......  ............[92m[OKAY][0m 
 [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
[93m[NO][0m ....... sparse_attn[92m[OKAY][0m transformer............
sparse_attn transformer............  ............[93m[NO][0m  [93m[NO][0m.......  sparse_attn.......[92m[OKAY][0msparse_attn  
 ............[92m[OKAY][0m............ transformer
  sparse_attn............[93m[NO][0m   ............[93m[NO][0m.......transformer    ............[93m[NO][0m.......[92m[OKAY][0m   
[93m[NO][0m[92m[OKAY][0m....... 
 .......transformer[92m[OKAY][0m  
 [93m[NO][0m [93m[NO][0m ............ .......stochastic_transformer  ....... [92m[OKAY][0m [93m[NO][0m
 .[92m[OKAY][0m....... 
 [93m[NO][0m[92m[OKAY][0m 
stochastic_transformer[92m[OKAY][0m............  
[93m[NO][0mtransformer.   .......stochastic_transformer............[93m[NO][0m    [92m[OKAY][0m[93m[NO][0m........
transformertransformer.......   ............[92m[OKAY][0m 
stochastic_transformer............[93m[NO][0m   [93m[NO][0m........   .......[92m[OKAY][0m[93m[NO][0m 
   [93m[NO][0m.......[92m[OKAY][0m  
.......[92m[OKAY][0mstochastic_transformer 
 [92m[OKAY][0m
 [92m[OKAY][0m.......
 [92m[OKAY][0m
. [93m[NO][0mstochastic_transformer .......  [92m[OKAY][0m.
stochastic_transformer stochastic_transformer.  [93m[NO][0m ........  [93m[NO][0m[92m[OKAY][0m 
 [93m[NO][0m ....... [92m[OKAY][0m
....... [92m[OKAY][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
ninjaninjaninjaninja   .................. .................................... ..................   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m


----------------------------------------------------------------------------------------------------
--------------------------------------------------
--------------------------------------------------
op name
op name op name ................op name  ................  ................installed................ installed  installed installed ......    ..compatiblecompatiblecompatible

 
------------------------------------------------------------------------------------------------------------------------------------------------------compatible


--------------------------------------------------
cpu_adamcpu_adam cpu_adam  ..............................cpu_adam...............   [92m[YES][0m [92m[YES][0m[92m[YES][0m ...............  ..................   [92m[YES][0m [92m[OKAY][0m[92m[OKAY][0m 

[92m[OKAY][0m......
 [92m[OKAY][0m
fused_adamfused_adamfused_adam  .............fused_adam  ............. [93m[NO][0m ............. .............[93m[NO][0m.......    [93m[NO][0m.......[92m[OKAY][0m[93m[NO][0m  
....... [92m[OKAY][0m .......
[92m[OKAY][0m fused_lamb
[92m[OKAY][0m 
fused_lamb.............fused_lamb  fused_lamb  [93m[NO][0m.......................................    .......[93m[NO][0m[93m[NO][0m[93m[NO][0m    .............. [92m[OKAY][0m....... 
[92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m


sparse_attnsparse_attnsparse_attn   ............sparse_attn............ ........................    [93m[NO][0m[93m[NO][0m[93m[NO][0m[93m[NO][0m    ............................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m

[92m[OKAY][0m

transformer ............transformertransformer transformer  [93m[NO][0m............ ............  [93m[NO][0m............ .......  [93m[NO][0m.......  [92m[OKAY][0m[93m[NO][0m [92m[OKAY][0m
 .......
.......  [92m[OKAY][0mstochastic_transformerstochastic_transformer[92m[OKAY][0m
  
..  stochastic_transformer[93m[NO][0mstochastic_transformer[93m[NO][0m    ...............   .[92m[OKAY][0m[93m[NO][0m[92m[OKAY][0m 
 
[93m[NO][0m.......  [92m[OKAY][0m
....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
ninjaninjaninjaninja  ..................  .................. .................................... [92m[OKAY][0m  [92m[OKAY][0m
[92m[OKAY][0m[92m[OKAY][0m--------------------------------------------------


----------------------------------------------------------------------------------------------------op name--------------------------------------------------


 op nameop name................  op name ................................installed   ................ installed..installed   installed.. compatible  ....compatible  

compatiblecompatible--------------------------------------------------
--------------------------------------------------


----------------------------------------------------------------------------------------------------

cpu_adamcpu_adam  cpu_adam...............cpu_adam...............    ...............[92m[YES][0m...............[92m[YES][0m    ............[92m[YES][0m [92m[YES][0m  [92m[OKAY][0m [92m[OKAY][0m............

  [92m[OKAY][0m[92m[OKAY][0m

fused_adamfused_adam  fused_adam..........................fused_adam   [93m[NO][0m [93m[NO][0m..........................   [93m[NO][0m ..............  [93m[NO][0m [92m[OKAY][0m[92m[OKAY][0m....... 
 
.......[92m[OKAY][0m 
fused_lamb[92m[OKAY][0m fused_lamb.............
fused_lamb   fused_lamb[93m[NO][0m..........................    .......[93m[NO][0m.............[93m[NO][0m  [92m[OKAY][0m.......   
[93m[NO][0m[92m[OKAY][0m....... 
 .......[92m[OKAY][0m 
[92m[OKAY][0m
sparse_attnsparse_attn  ............sparse_attn............  sparse_attn [93m[NO][0m[93m[NO][0m  ............ ..........................    [92m[OKAY][0m[93m[NO][0m[93m[NO][0m[92m[OKAY][0m

  ..............  transformer[92m[OKAY][0mtransformer[92m[OKAY][0m 
 
........................ transformer [93m[NO][0mtransformer  [93m[NO][0m ................... ............  .......[92m[OKAY][0m  [93m[NO][0m
[93m[NO][0m[92m[OKAY][0m
  ..............stochastic_transformer   [92m[OKAY][0mstochastic_transformer[92m[OKAY][0m
 
. .[93m[NO][0m stochastic_transformer[93m[NO][0mstochastic_transformer   ....... ........ .  [92m[OKAY][0m [93m[NO][0m[92m[OKAY][0m

 [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
------------------------------------------------------------------------------------------------------------------------------------------------------


DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report


----------------------------------------------------------------------------------------------------
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja

--------------------------------------------------JIT compiled ops requires ninja--------------------------------------------------

JIT compiled ops requires ninja

DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc versionDeepSpeed general environment info:  ..........................................  
11.211.2

deepspeed install pathdeepspeed install path  torch install path......................   ...............['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']0.4.2+bc17042, bc17042, big-science


deepspeed wheel compiled w. deepspeed wheel compiled w....... torch version ...... torch 1.8, cuda 11.1 ....................
torch 1.8, cuda 11.1 
1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
ninjaninjaninja  ninja ......................................................   [92m[OKAY][0m [92m[OKAY][0m..................

 [92m[OKAY][0m--------------------------------------------------[92m[OKAY][0m--------------------------------------------------


op name-------------------------------------------------- op name
--------------------------------------------------................  op name
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
................installed   ................installedop name .. installed..  ................  compatible..compatible 

 installed----------------------------------------------------------------------------------------------------compatible
 

..-------------------------------------------------- 
compatible
cpu_adam-------------------------------------------------- cpu_adam
...............  ...............cpu_adam[92m[YES][0m   [92m[YES][0m.....................   ......[92m[YES][0m[92m[OKAY][0m  
[92m[OKAY][0mcpu_adam......
  [92m[OKAY][0m...............
 [92m[YES][0m fused_adam......  .............fused_adam[92m[OKAY][0m  
fused_adam[93m[NO][0m.............   ....................[93m[NO][0m   [93m[NO][0m.......[92m[OKAY][0m  .......
[92m[OKAY][0m 
[92m[OKAY][0m
fused_lambfused_lambfused_lamb fused_adam  ..........................   .............[93m[NO][0m.............[93m[NO][0m  [93m[NO][0m.......    ..............[93m[NO][0m[92m[OKAY][0m  
[92m[OKAY][0m [92m[OKAY][0m

....... [92m[OKAY][0m
fused_lamb ............. sparse_attn[93m[NO][0m sparse_attn............sparse_attn    ............[93m[NO][0m...................   [93m[NO][0m....... [93m[NO][0m  [92m[OKAY][0m [92m[OKAY][0m..............

  [92m[OKAY][0m[92m[OKAY][0m

transformer transformer............transformer   ............[93m[NO][0m............   .......[93m[NO][0m[93m[NO][0m   [92m[OKAY][0m..............
  sparse_attn[92m[OKAY][0m[92m[OKAY][0m
 
stochastic_transformer............ stochastic_transformerstochastic_transformer   [93m[NO][0m. . .[93m[NO][0m .......  [93m[NO][0m[93m[NO][0m   .......[92m[OKAY][0m.............. 
  [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ...................DeepSpeed general environment info: DeepSpeed general environment info:0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.
 ...... torch 1.8, cuda 11.1torch install path
 torch install path...............  ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version torch version....................  ....................1.8.1 
1.8.1
torch cuda version torch cuda version...............  ...............11.1
 11.1nvcc version
 .....................nvcc version  11.2.....................
 11.2deepspeed install path
 ...........deepspeed install path  ...........['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info
 deepspeed info...................  ................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1
torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

DeepSpeed general environment info:
async_io async_io...............  [93m[NO][0m...............  [93m[NO][0m.......  .......[93m[NO][0m 
[93m[NO][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
transformer_inference .. transformer_inference[93m[NO][0m  .........  [93m[NO][0m[92m[OKAY][0m 
torch version .................... 1.8.1
....... [92m[OKAY][0m
torch cuda version ............... 11.1
utils .................. utils[92m[YES][0m  ........................  [92m[OKAY][0m[92m[YES][0m
nvcc version ..................... 11.2
 ...... [92m[OKAY][0m
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
quantizer ..............quantizer  [93m[NO][0m..............  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
--------------------------------------------------
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inference transformer_inference..  ..[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

--------------------------------------------------
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
DeepSpeed general environment info:
async_io ............... [93m[NO][0m ....... [93m[NO][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
async_io ............... [93m[NO][0m ....... [93m[NO][0m
torch cuda version ............... 11.1
nvcc version ..................... 11.2
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils async_io..................  ...............[92m[YES][0m  [93m[NO][0m......  .......[92m[OKAY][0m 
[93m[NO][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------transformer_inference
 .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
torch cuda version ............... 11.1
async_io ............... [93m[NO][0m ....... [93m[NO][0m
nvcc version ..................... 11.2
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
--------------------------------------------------
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
async_io ............... [93m[NO][0m ....... [93m[NO][0m
torch version .................... 1.8.1
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
torch cuda version ............... 11.1
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
nvcc version ..................... 11.2
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
--------------------------------------------------
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninjaninja ..................  ..................[92m[OKAY][0m 
[92m[OKAY][0m--------------------------------------------------

--------------------------------------------------op name
 ................op name installed  ..................  compatibleinstalled
 --------------------------------------------------..
 compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m cpu_adam......  ...............[92m[OKAY][0m 
[92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0mfused_adam
 ............. fused_lamb[93m[NO][0m  ............. .......[93m[NO][0m  [92m[OKAY][0m....... 
[92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............transformer  [93m[NO][0m............  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
transformer stochastic_transformer............  .[93m[NO][0m  [93m[NO][0m.......  ....... [92m[OKAY][0m[92m[OKAY][0m

stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

--------------------------------------------------
--------------------------------------------------

deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja
JIT compiled ops requires ninja

--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
async_io ............... async_io[93m[NO][0m  ......................  [93m[NO][0m[93m[NO][0m 
....... [93m[NO][0m
transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report

--------------------------------------------------
--------------------------------------------------
--------------------------------------------------JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


--------------------------------------------------DeepSpeed C++/CUDA extension op report

JIT compiled ops requires ninja--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninja
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****


DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

--------------------------------------------------DeepSpeed C++/CUDA extension op report

JIT compiled ops requires ninja--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja


--------------------------------------------------DeepSpeed C++/CUDA extension op report

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja--------------------------------------------------

JIT compiled ops requires ninja
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


ninjaninjaninjaninja    ...................................................... ..................  [92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m[92m[OKAY][0m
----------------------------------------------------------------------------------------------------

----------------------------------------------------------------------------------------------------op name


------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------
op name op name op name................ ................  ................ ................installed installed  installed ..installed ..  ..compatible  ..
compatiblecompatible 

op nameop name
--------------------------------------------------compatible--------------------------------------------------

--------------------------------------------------

--------------------------------------------------
  op name................op name ................   ................................installedinstalled   .. installedinstalled..    compatiblecompatible
....
--------------------------------------------------  
compatible--------------------------------------------------compatible


cpu_adamcpu_adam  .............................. cpu_adamcpu_adam[92m[YES][0m    [92m[YES][0m....................................    ......[92m[YES][0m[92m[YES][0m[92m[OKAY][0m  
--------------------------------------------------
--------------------------------------------------
cpu_adam ............... [92m[YES][0mcpu_adam  ......cpu_adam...............   cpu_adam...............[92m[YES][0m[92m[OKAY][0m   
 ......[92m[OKAY][0m...... 
 [92m[OKAY][0m
[92m[OKAY][0m
...............[92m[YES][0m......   ......[92m[YES][0m[92m[OKAY][0m  
[92m[OKAY][0m......
fused_adam ............. fused_adam[93m[NO][0m  ....................  fused_adam[93m[NO][0m [92m[OKAY][0mfused_adam ............. 
....... ............. [93m[NO][0m  fused_lamb[93m[NO][0m[92m[OKAY][0m ....... 
 fused_adam[92m[OKAY][0m 
............. ....... [92m[OKAY][0m fused_lamb [93m[NO][0m
 [92m[OKAY][0m....................
............. [93m[NO][0m ....... fused_adam[92m[OKAY][0m 
  [93m[NO][0mfused_lamb[92m[OKAY][0m  .......
.............fused_lamb   [92m[OKAY][0m[93m[NO][0m.............
............. fused_adam[93m[NO][0mfused_lamb   fused_adam.................................    [93m[NO][0m.............[93m[NO][0m[92m[OKAY][0m   
  .......[93m[NO][0m  [92m[OKAY][0m
....... [92m[OKAY][0m
..............[93m[NO][0m  [92m[OKAY][0m[92m[OKAY][0m
 fused_lamb
.......  .............[92m[OKAY][0m fused_lamb[93m[NO][0m
sparse_attn ............ [93m[NO][0m sparse_attn.......  ............[92m[OKAY][0m 
  .................... fused_lamb [93m[NO][0m  sparse_attn.................... [92m[OKAY][0m............
[93m[NO][0m .......sparse_attn  transformer[92m[OKAY][0m............ 
   [93m[NO][0m[92m[OKAY][0m[93m[NO][0m 
.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
 sparse_attn............[93m[NO][0m transformer ............  [93m[NO][0m...................   [93m[NO][0m .......[92m[OKAY][0m[93m[NO][0m
   [92m[OKAY][0m..............
sparse_attn ............transformer  [93m[NO][0m............  .......[93m[NO][0m  sparse_attn[92m[OKAY][0m.......sparse_attn 
  transformer[92m[OKAY][0m[92m[OKAY][0m 
stochastic_transformer
............  [93m[NO][0m transformer........ stochastic_transformer  [93m[NO][0m[92m[OKAY][0m ............
  ............[92m[OKAY][0mtransformer............ 
  ........[93m[NO][0m   [93m[NO][0m[92m[OKAY][0mstochastic_transformer .......
 [93m[NO][0m ............ [93m[NO][0mstochastic_transformer.......    [93m[NO][0m[92m[OKAY][0m........ 
 .......  [92m[OKAY][0m[92m[OKAY][0m.

 ....... [93m[NO][0m[92m[OKAY][0mtransformer  
 [92m[OKAY][0m...................
 [93m[NO][0m .......stochastic_transformer  [92m[OKAY][0m
. [93m[NO][0m ....... [92m[OKAY][0m
  transformer[92m[OKAY][0m 
[93m[NO][0mstochastic_transformer............  .......  [93m[NO][0m.[92m[OKAY][0m  
.......[93m[NO][0m  .......[92m[OKAY][0mstochastic_transformer 
 [92m[OKAY][0m
. stochastic_transformer[93m[NO][0m  ....... .[92m[OKAY][0m
 [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ...............async_io  [93m[NO][0m...............  .......[93m[NO][0m  [93m[NO][0m.......
 [93m[NO][0m
transformer_inference .. [93m[NO][0m .......transformer_inference  [92m[OKAY][0m..
 [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m utils......  ..................[92m[OKAY][0m 
[92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m quantizer.......  ..............[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m--------------------------------------------------

--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
----------------------------------------------------------------------------------------------------

nvcc version ..................... 11.2
JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report
--------------------------------------------------
--------------------------------------------------

DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja

--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
ninjaninjaninjaninja   .................. .................. .................................... [92m[OKAY][0m[92m[OKAY][0m 
 [92m[OKAY][0m
[92m[OKAY][0m--------------------------------------------------
--------------------------------------------------


--------------------------------------------------op name
--------------------------------------------------op name 
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
op name ................ ................ op name................ installed   installedinstalled..................   ..compatible .. 
installed compatible compatible--------------------------------------------------
..
--------------------------------------------------
 
--------------------------------------------------compatible

--------------------------------------------------
cpu_adam cpu_adam...............  ...............[92m[YES][0m cpu_adam[92m[YES][0m  ...... cpu_adam...............   ......[92m[OKAY][0m[92m[YES][0m 
 ...............[92m[OKAY][0m...... 
 [92m[YES][0m[92m[OKAY][0m 
...... [92m[OKAY][0m
fused_adam .............fused_adam  [93m[NO][0m.............  .......[93m[NO][0m fused_adam [92m[OKAY][0m fused_adam.......
.............   [92m[OKAY][0m.............fused_lamb[93m[NO][0m 
[93m[NO][0m   ...........................fused_lamb   [93m[NO][0m[92m[OKAY][0m[92m[OKAY][0m ............. 

 .......[93m[NO][0m  [92m[OKAY][0mfused_lamb.......fused_lamb 
.............   [92m[OKAY][0m[93m[NO][0m.............
  .......[93m[NO][0m [92m[OKAY][0m 
....... [92m[OKAY][0msparse_attn
 ............ [93m[NO][0m sparse_attn.......  ............[92m[OKAY][0m 
[93m[NO][0m sparse_attn.......transformer   ............sparse_attn............[92m[OKAY][0m   
[93m[NO][0m............[93m[NO][0mtransformer    ..............[93m[NO][0m ............  [92m[OKAY][0m[92m[OKAY][0m .......

[93m[NO][0m  [92m[OKAY][0m.......
transformer stochastic_transformer [92m[OKAY][0mtransformer ............
  .............[93m[NO][0m  stochastic_transformer [93m[NO][0m[93m[NO][0m  ....... ....... ....... . [92m[OKAY][0m[92m[OKAY][0m [92m[OKAY][0m

[93m[NO][0m
 ....... stochastic_transformer[92m[OKAY][0m 
stochastic_transformer . .[93m[NO][0m  [93m[NO][0m.......  [92m[OKAY][0m.......
 [92m[OKAY][0m
----------------------------------------------------------------------------------------------------

--------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

DeepSpeed C++/CUDA extension op report

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------

--------------------------------------------------
--------------------------------------------------
--------------------------------------------------JIT compiled ops requires ninja
JIT compiled ops requires ninja


JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------
JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... async_io[93m[NO][0m  ......................  [93m[NO][0m[93m[NO][0m 
....... [93m[NO][0m
transformer_inference .. [93m[NO][0m transformer_inference.......  ..[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
utils .................. [92m[YES][0m ......quantizer  [92m[OKAY][0m..............
 [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m-------------------------------------------------- 
....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
ninjaninjaninjaninja    ...................................................... ..................   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------


op nameop name op nameop name  ................ ................................ ................  installed installedinstalled installed..    ......compatible  compatible compatible
compatible
--------------------------------------------------

----------------------------------------------------------------------------------------------------


--------------------------------------------------
cpu_adamcpu_adamcpu_adam   .............................................   [92m[YES][0m[92m[YES][0m[92m[YES][0m  cpu_adam ............ ......   ...............[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


 [92m[YES][0m ...... [92m[OKAY][0m
fused_adamfused_adam fused_adam ............. ............. ............. [93m[NO][0m [93m[NO][0m [93m[NO][0m.......   ..............[92m[OKAY][0m  
[92m[OKAY][0m[92m[OKAY][0m

fused_adam fused_lambfused_lamb  fused_lamb.......................... .............  ............. [93m[NO][0m[93m[NO][0m[93m[NO][0m    .......[93m[NO][0m..............    .......[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m 

[92m[OKAY][0m

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
fused_lamb ............. [93m[NO][0m ....... sparse_attnsparse_attnsparse_attn[92m[OKAY][0m   
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
....................................  [93m[NO][0m [93m[NO][0m [93m[NO][0m ....... ....... ....... [92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m

--------------------------------------------------
--------------------------------------------------JIT compiled ops requires ninja

DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
transformertransformer transformer  ....................................   [93m[NO][0m[93m[NO][0m[93m[NO][0m   .....................   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m
sparse_attn

 stochastic_transformerstochastic_transformer stochastic_transformer  .............. .  [93m[NO][0m [93m[NO][0m[93m[NO][0m [93m[NO][0m  ....... .....................   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m 


[92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
ninjaninjaninjaninja    ........................................................................   [92m[OKAY][0m[92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m


------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------
op nameop name
 op name ................op name................    installed................installed................    ..installed..installed    compatible..compatible
.. 
--------------------------------------------------compatible --------------------------------------------------

compatible

--------------------------------------------------
--------------------------------------------------
cpu_adamcpu_adam cpu_adam cpu_adam..............................    ..............................[92m[YES][0m[92m[YES][0m    [92m[YES][0m[92m[YES][0m............   ...... ......[92m[OKAY][0m [92m[OKAY][0m 

[92m[OKAY][0m[92m[OKAY][0m

fused_adamfused_adam  fused_adam.......................... fused_adam  [93m[NO][0m .............[93m[NO][0m .............  .............. [93m[NO][0m  [93m[NO][0m [92m[OKAY][0m[92m[OKAY][0m .......
.......
  fused_lamb[92m[OKAY][0mfused_lamb[92m[OKAY][0m  
.............
.............  [93m[NO][0mfused_lamb [93m[NO][0mfused_lamb .......   .................................  [92m[OKAY][0m [92m[OKAY][0m[93m[NO][0m
[93m[NO][0m
  ..............  [92m[OKAY][0m[92m[OKAY][0m

sparse_attn ............ [93m[NO][0msparse_attn  ...................  sparse_attnsparse_attn[92m[OKAY][0m[93m[NO][0m  
 ...............................   [93m[NO][0mtransformer[92m[OKAY][0m[93m[NO][0m 
 ............  ..............transformer [93m[NO][0m [92m[OKAY][0m ............ 
....... [92m[OKAY][0m [93m[NO][0mtransformer
[92m[OKAY][0m  
.......transformer............  [92m[OKAY][0m stochastic_transformer
............ [93m[NO][0m  .[93m[NO][0m.......  stochastic_transformer[93m[NO][0m.......    [92m[OKAY][0m........[92m[OKAY][0m
 
 [93m[NO][0m[92m[OKAY][0m 
stochastic_transformer.......  stochastic_transformer[92m[OKAY][0m 
. .[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utilsasync_io ..................  ...............[92m[YES][0m  [93m[NO][0m......  .......[92m[OKAY][0m 
[93m[NO][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference ..-------------------------------------------------- 
[93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninjaninjaninjaninja    .................................... ....................................[92m[OKAY][0m   [92m[OKAY][0m
[92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------

------------------------------------------------------------------------------------------------------------------------------------------------------
op name

 op name................op nameop name    ................................installed................   installed installedinstalled..    ......compatible 
 -------------------------------------------------- compatiblecompatiblecompatible


------------------------------------------------------------------------------------------------------------------------------------------------------


cpu_adam ...............cpu_adamcpu_adam   ...............[92m[YES][0m............... cpu_adam   [92m[YES][0m......[92m[YES][0m   ......[92m[OKAY][0m.....................  
[92m[OKAY][0m [92m[OKAY][0m
[92m[YES][0m
 fused_adam......  fused_adam.............[92m[OKAY][0m  
.............fused_adam[93m[NO][0m   [93m[NO][0m............. ....... ....... [93m[NO][0m [92m[OKAY][0m 
[92m[OKAY][0m.......
 fused_lamb[92m[OKAY][0m fused_lamb
.............  .............fused_lamb[93m[NO][0m   [93m[NO][0m.............  ....... fused_adam.......[93m[NO][0m[92m[OKAY][0m   .............
....... [92m[OKAY][0m 
[93m[NO][0m[92m[OKAY][0m
 ....... [92m[OKAY][0m
sparse_attn ............ fused_lamb[93m[NO][0m  sparse_attn....................   ............[92m[OKAY][0m sparse_attn
[93m[NO][0m [93m[NO][0m  ............transformer .............. [93m[NO][0m ............ [92m[OKAY][0m ....... [93m[NO][0m
 [92m[OKAY][0m [92m[OKAY][0m

transformer.......transformer   ............[92m[OKAY][0m............  
[93m[NO][0m[93m[NO][0m  .......stochastic_transformer ....... [92m[OKAY][0m 
[92m[OKAY][0m.
 [93m[NO][0mstochastic_transformer sparse_attn .......stochastic_transformer  .[92m[OKAY][0m 
.[93m[NO][0m  ............ ....... [93m[NO][0m  [92m[OKAY][0m
.......[93m[NO][0m  [92m[OKAY][0m....... 
[92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------
DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------


DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report--------------------------------------------------


--------------------------------------------------JIT compiled ops requires ninja--------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


------------------------------------------------------------------------------------------------------------------------------------------------------


JIT compiled ops requires ninjaJIT compiled ops requires ninjaJIT compiled ops requires ninja


/bin/sh: line 0: type: git: not found
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_ioasync_io ...............  ...............[93m[NO][0m  [93m[NO][0m.......  .......[93m[NO][0m 
[93m[NO][0m
transformer_inference transformer_inference..  ..[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
quantizer quantizer..............  ..............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
----------------------------------------------------------------------------------------------------

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****
utils .................. [92m[YES][0m ...... [92m[OKAY][0m

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.quantizer 
.............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninjaninjaninjaninja   .................................... ....................................  [92m[OKAY][0m[92m[OKAY][0m  
[92m[OKAY][0m
--------------------------------------------------[92m[OKAY][0m

--------------------------------------------------

--------------------------------------------------op name-------------------------------------------------- op name

................ op name................ op name  installed ................ installed................  .. installed.. installed .. compatible ..
compatible 
--------------------------------------------------compatible
--------------------------------------------------

 --------------------------------------------------compatible

cpu_adam ............... [92m[YES][0mcpu_adam -------------------------------------------------- cpu_adam......
...............   ...............[92m[YES][0m[92m[OKAY][0m  ......
[92m[YES][0m [92m[OKAY][0m 
......cpu_adam [92m[OKAY][0m
fused_adam .............  ...............fused_adam [93m[NO][0m  .............fused_adam.......  [92m[YES][0m.............[93m[NO][0m  [92m[OKAY][0m .......
 [93m[NO][0m ......[92m[OKAY][0m fused_lamb 
 .......[92m[OKAY][0m
.............fused_lamb   [93m[NO][0m[92m[OKAY][0m............. 
 .......[93m[NO][0m  fused_lamb[92m[OKAY][0m.......  .............
[92m[OKAY][0mfused_adam 
 [93m[NO][0m.............  .......[93m[NO][0m [92m[OKAY][0m
sparse_attn  ................... sparse_attn  [92m[OKAY][0m[93m[NO][0m............  sparse_attn.......[93m[NO][0m
   ...................[92m[OKAY][0m  
[92m[OKAY][0mfused_lamb[93m[NO][0m
  transformer....... transformer ............ [92m[OKAY][0m 
.........................[93m[NO][0m  [93m[NO][0mtransformer.......    ...................[93m[NO][0m[92m[OKAY][0m  
[93m[NO][0m[92m[OKAY][0m  
..............stochastic_transformer   [92m[OKAY][0mstochastic_transformer 
.[92m[OKAY][0m
 .[93m[NO][0m stochastic_transformer [93m[NO][0m  ..............  .[92m[OKAY][0m[92m[OKAY][0m 
[93m[NO][0m
 ....... sparse_attn[92m[OKAY][0m 
............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... transformer_inference[92m[OKAY][0m 
.. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. utils[92m[YES][0m  ........................  [92m[YES][0m[92m[OKAY][0m 
...... [92m[OKAY][0m
quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

/bin/sh: line 0: type: git: not found
async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inference .. [93m[NO][0mtransformer_inference  .........  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
utils .................. utils[92m[YES][0m  ........................  [92m[YES][0m[92m[OKAY][0m 
...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m quantizer.......  ..............[92m[OKAY][0m 
[93m[NO][0m ....... --------------------------------------------------[92m[OKAY][0m

--------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... [93m[NO][0m ....... [93m[NO][0masync_io
 ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0mtransformer_inference
 .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ......utils  [92m[OKAY][0m..................
 [92m[YES][0m ...... [92m[OKAY][0mquantizer
 .............. [93m[NO][0m quantizer.......  ..............[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m--------------------------------------------------

--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
------------------------------------------------------------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
JIT compiled ops requires ninja
JIT compiled ops requires ninja
JIT compiled ops requires ninja

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
ninjaninjaninjaninja  ..................   ......................................................[92m[OKAY][0m   
[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m

------------------------------------------------------------------------------------------------------------------------------------------------------op name


utils .................. [92m[YES][0m ...... [92m[OKAY][0m
 ................op nameop name op name installed ................   ..................................installed    installedinstalledcompatible..  
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
 ..compatible..--------------------------------------------------
  
--------------------------------------------------compatiblecompatible


----------------------------------------------------------------------------------------------------

--------------------------------------------------
cpu_adam ............... cpu_adam[92m[YES][0m  .....................  [92m[YES][0m[92m[OKAY][0mcpu_adam 
 cpu_adam.....................   ...............[92m[YES][0m[92m[OKAY][0m  
[92m[YES][0m......  ......fused_adam[92m[OKAY][0m  
[92m[OKAY][0m.............
 fused_adam[93m[NO][0m  ....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
fused_adamfused_lamb fused_lamb fused_adam.............  .......................... .............[93m[NO][0m   [93m[NO][0m [93m[NO][0m[93m[NO][0m ....... ....... .......  [92m[OKAY][0m .......[92m[OKAY][0m
[92m[OKAY][0m 

[92m[OKAY][0m
fused_lamb .............fused_lamb  [93m[NO][0m.............  .......[93m[NO][0m  sparse_attn[92m[OKAY][0msparse_attn....... 
  ........................[92m[OKAY][0m  
[93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

transformer transformer............ sparse_attn............   [93m[NO][0m............[93m[NO][0msparse_attn    .......[93m[NO][0m...................    [92m[OKAY][0m.......[92m[OKAY][0m[93m[NO][0m
  
[92m[OKAY][0m.......
 stochastic_transformerstochastic_transformer[92m[OKAY][0m  
transformer..   transformer............[93m[NO][0m[93m[NO][0m    ..........................[93m[NO][0m   [92m[OKAY][0m [92m[OKAY][0m[93m[NO][0m

.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
stochastic_transformer stochastic_transformer . [93m[NO][0m.  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... utils[92m[OKAY][0m 
.................. [92m[YES][0m ......utils  [92m[OKAY][0m..................
 [92m[YES][0m ......quantizer  [92m[OKAY][0m..............
 [93m[NO][0m ....... quantizer[92m[OKAY][0m 
.............. [93m[NO][0m .......-------------------------------------------------- 
[92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
DeepSpeed general environment info:
torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 
.................... 1.8.1torch version
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
----------------------------------------------------------------------------------------------------
 ....................torch cuda version  1.8.1...............
 11.1torch cuda version
JIT compiled ops requires ninja

----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report


DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.----------------------------------------------------------------------------------------------------


--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


 nvcc version...............  .....................11.1 
11.2nvcc version
JIT compiled ops requires ninja--------------------------------------------------
--------------------------------------------------

JIT compiled ops requires ninjaJIT compiled ops requires ninja

 deepspeed install path.....................  ...........11.2 
deepspeed install path['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
........... deepspeed info ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']...................
 0.4.2+bc17042, bc17042, big-sciencedeepspeed info
 ...................deepspeed wheel compiled w.  ...... 0.4.2+bc17042, bc17042, big-sciencetorch 1.8, cuda 11.1

deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
torch cuda version ............... 11.1
async_io ............... [93m[NO][0m ....... [93m[NO][0m
nvcc version ..................... 11.2
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
transformer_inference .. [93m[NO][0masync_io  ......................  [93m[NO][0m[92m[OKAY][0m 
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
....... [93m[NO][0m
utils .................. [92m[YES][0m transformer_inference......  .. [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
quantizerutils  ................................  [92m[YES][0m[93m[NO][0m  ............. [92m[OKAY][0m 
[92m[OKAY][0m
quantizer .............. [93m[NO][0m --------------------------------------------------....... 
[92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
torch cuda version ............... 11.1
nvcc version ..................... 11.2
/bin/sh: line 0: type: git: not found
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1

**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


op nameop nameop nameop name    ................................................................    installedinstalledinstalledinstalled    ........    compatiblecompatiblecompatiblecompatible


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


cpu_adamcpu_adamcpu_adamcpu_adam    ............................................................    [92m[YES][0m[92m[YES][0m[92m[YES][0m[92m[YES][0m    ........................   [92m[OKAY][0m 
[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


fused_adam .............fused_adam fused_adamfused_adam [93m[NO][0m  ............. ............. .................... [93m[NO][0m  [93m[NO][0m [93m[NO][0m[92m[OKAY][0m ....... 
 ..............[92m[OKAY][0m  
[92m[OKAY][0m[92m[OKAY][0m

fused_lamb .............fused_lamb fused_lamb [93m[NO][0mfused_lamb.............    .................................[93m[NO][0m    [92m[OKAY][0m[93m[NO][0m[93m[NO][0m
  .....................   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


DeepSpeed general environment info:
sparse_attn ............ [93m[NO][0m sparse_attn.......sparse_attn   ............sparse_attn[92m[OKAY][0m............  
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
 [93m[NO][0m............ [93m[NO][0m ....... [93m[NO][0mtransformer .......  [92m[OKAY][0m ...................
[92m[OKAY][0m  
[92m[OKAY][0m[93m[NO][0m
torch cuda version ............... 11.1
 transformer....... transformer transformer............  [92m[OKAY][0m ........................
[93m[NO][0m   [93m[NO][0m[93m[NO][0m.......   ..............[92m[OKAY][0m  
[92m[OKAY][0m[92m[OKAY][0mstochastic_transformer
nvcc version ..................... 11.2

deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
 . stochastic_transformerstochastic_transformerstochastic_transformer [93m[NO][0m   .........  . [93m[NO][0m [92m[OKAY][0m [93m[NO][0m[93m[NO][0m
.......   ..............[92m[OKAY][0m  
[92m[OKAY][0m[92m[OKAY][0m

deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninjaJIT compiled ops requires ninja
--------------------------------------------------

DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
----------------------------------------------------------------------------------------------------
JIT compiled ops requires ninja

DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
/bin/sh: line 0: type: git: not found
torch cuda version ............... 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
nvcc version ..................... 11.2
async_io ............... [93m[NO][0m ....... [93m[NO][0m
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
--------------------------------------------------
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninjaninjaninja ninja  .................. .................................... ..................[92m[OKAY][0m 
  [92m[OKAY][0m--------------------------------------------------[92m[OKAY][0m[92m[OKAY][0m


--------------------------------------------------op name-------------------------------------------------- --------------------------------------------------

................
op nameop name  op name ................................installed    ................installed..installed    installedcompatible..
  ....compatible--------------------------------------------------  

compatible--------------------------------------------------

--------------------------------------------------
compatible
--------------------------------------------------
cpu_adamcpu_adam cpu_adam ............... ............... ............... [92m[YES][0m [92m[YES][0m [92m[YES][0m ...... ...... cpu_adam...... [92m[OKAY][0m  
...............[92m[OKAY][0m[92m[OKAY][0m 

[92m[YES][0m ...... fused_adam[92m[OKAY][0m 
.............fused_adam fused_adam [93m[NO][0m  .................................   [93m[NO][0m[92m[OKAY][0m[93m[NO][0m 
 .............. fused_adam fused_lamb[92m[OKAY][0m [92m[OKAY][0m 
.............
.............  [93m[NO][0m[93m[NO][0mfused_lamb  fused_lamb ....... ............. ....... .............[93m[NO][0m[92m[OKAY][0m   
[92m[OKAY][0m[93m[NO][0m.......  .......[92m[OKAY][0m 

[92m[OKAY][0m
fused_lambsparse_attn  .........................  [93m[NO][0m[93m[NO][0m  .......sparse_attn .......[92m[OKAY][0m  
sparse_attn[92m[OKAY][0m ............
............  [93m[NO][0m[93m[NO][0mtransformer   ..........................   [92m[OKAY][0m[93m[NO][0m[92m[OKAY][0m
 
....... transformersparse_attntransformer[92m[OKAY][0m 
............  ........................   [93m[NO][0m[93m[NO][0mstochastic_transformer[93m[NO][0m  .......  ..............  .[92m[OKAY][0m[92m[OKAY][0m 
 
[93m[NO][0m[92m[OKAY][0m 
....... stochastic_transformer[92m[OKAY][0mtransformerstochastic_transformer 
 ..   ............[93m[NO][0m[93m[NO][0m  [93m[NO][0m ....... ....... .......[92m[OKAY][0m  [92m[OKAY][0m
[92m[OKAY][0m

stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report--------------------------------------------------


DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report--------------------------------------------------

DeepSpeed C++/CUDA extension op report
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

----------------------------------------------------------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja--------------------------------------------------


JIT compiled ops requires ninja--------------------------------------------------

JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... DeepSpeed general environment info:['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

DeepSpeed general environment info:torch version .................... 
1.8.1torch install path
 ...............torch cuda version torch install path ...............  11.1...............
 ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']nvcc version
 ..................... torch version11.2 
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']....................deepspeed install path
  1.8.1...........
 torch version ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']torch cuda version....................
  deepspeed info1.8.1............... 
 ...................11.1 torch cuda version
0.4.2+bc17042, bc17042, big-science nvcc version
...............  deepspeed wheel compiled w......................11.1  
......11.2 
nvcc versiontorch 1.8, cuda 11.1deepspeed install path 
 ................................  11.2
deepspeed install path['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
...........deepspeed info  ...................['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
0.4.2+bc17042, bc17042, big-sciencedeepspeed info
 deepspeed wheel compiled w....................  ...... 0.4.2+bc17042, bc17042, big-sciencetorch 1.8, cuda 11.1

deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninjaninjaninjaninja   .................. .................. ....................................[92m[OKAY][0m   
[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m
--------------------------------------------------


----------------------------------------------------------------------------------------------------op name

 --------------------------------------------------................op nameop name  
installed................   ................op nameinstalled..    installed..compatible  
..................compatible--------------------------------------------------  compatible

installed
-------------------------------------------------- 
--------------------------------------------------..
cpu_adam  ...............compatible 
[92m[YES][0mcpu_adam--------------------------------------------------  .....................cpu_adam 
...............   [92m[YES][0m[92m[OKAY][0m[92m[YES][0m 
 ............  [92m[OKAY][0m[92m[OKAY][0m

cpu_adam ............... [92m[YES][0m ......fused_adam  [92m[OKAY][0m.............fused_adam
  fused_adam[93m[NO][0m.............   ....................[93m[NO][0m   [92m[OKAY][0m[93m[NO][0m.......
  .......[92m[OKAY][0m 
[92m[OKAY][0mfused_lambfused_adam
 fused_lamb .......................... fused_lamb  [93m[NO][0m .............[93m[NO][0m.............   [93m[NO][0m .............. [93m[NO][0m  ....... .......[92m[OKAY][0m [92m[OKAY][0m 

[92m[OKAY][0m[92m[OKAY][0m

fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attnsparse_attn sparse_attn ............  ........................ [93m[NO][0m [93m[NO][0m [93m[NO][0m .......sparse_attn  ....... ................... [92m[OKAY][0m  
[92m[OKAY][0m[92m[OKAY][0m[93m[NO][0m

transformer  transformertransformer............ .......   ........................[93m[NO][0m[92m[OKAY][0m   
[93m[NO][0m[93m[NO][0m.......   ..............[92m[OKAY][0m 
 transformer[92m[OKAY][0m[92m[OKAY][0m 

stochastic_transformer............ stochastic_transformer stochastic_transformer [93m[NO][0m.   ..[93m[NO][0m  ....... [93m[NO][0m[93m[NO][0m .......  [92m[OKAY][0m.............. 
  [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda version torch cuda version...............  ...............11.1 
11.1nvcc version
 nvcc version.....................  .....................11.2 
11.2deepspeed install path
 deepspeed install path...........  ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info
 deepspeed info...................  ...................0.4.2+bc17042, bc17042, big-science 
0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w.
 deepspeed wheel compiled w.......  ...... torch 1.8, cuda 11.1torch 1.8, cuda 11.1

DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path ...............  ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version ...............  ...............11.1 
11.1nvcc version
 nvcc version.....................  .....................11.2 
11.2deepspeed install path
 deepspeed install path...........  ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info
 deepspeed info...................  ................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0masync_io
 ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inferenceutils  ....................  [93m[NO][0m[92m[YES][0m  .............  [92m[OKAY][0m[92m[OKAY][0m

quantizer utils..............  ..................[93m[NO][0m  [92m[YES][0m.......  ......[92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------quantizer
 .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inference ..transformer_inference  [93m[NO][0m..  .......[93m[NO][0m  [92m[OKAY][0m.......
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
 [92m[OKAY][0mquantizer
async_io ............... [93m[NO][0m ....... [93m[NO][0m
 .............. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
DeepSpeed general environment info:
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

torch version .................... 1.8.1
transformer_inference ..transformer_inference  [93m[NO][0m..  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
torch cuda version ............... 11.1
utils .................. utils[92m[YES][0m  ........................  [92m[YES][0m[92m[OKAY][0m 
...... [92m[OKAY][0m
quantizer .............. quantizer[93m[NO][0m  .....................  [93m[NO][0m[92m[OKAY][0m 
nvcc version ..................... 11.2
....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
----------------------------------------------------------------------------------------------------

--------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report


--------------------------------------------------DeepSpeed C++/CUDA extension op report----------------------------------------------------------------------------------------------------


--------------------------------------------------DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.----------------------------------------------------------------------------------------------------

--------------------------------------------------
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
JIT compiled ops requires ninja

JIT compiled ops requires ninja
JIT compiled ops requires ninja--------------------------------------------------


JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.quantizer ..............
 [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
--------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science
0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. deepspeed wheel compiled w.......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

DeepSpeed general environment info:
async_io async_io...............  ...............[93m[NO][0m  [93m[NO][0m.......  .......[93m[NO][0m 
[93m[NO][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
transformer_inference .. [93m[NO][0m .......transformer_inference  [92m[OKAY][0m..
 [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ......utils  [92m[OKAY][0m..................
/bin/sh: line 0: type: git: not found
torch version .................... 1.8.1
 [92m[YES][0m ......quantizer  [92m[OKAY][0m..............
 [93m[NO][0m ....... [92m[OKAY][0mquantizer
 .............. [93m[NO][0m --------------------------------------------------.......
torch cuda version ............... 11.1
 [92m[OKAY][0m
--------------------------------------------------
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
> setting tensorboard ...
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
/bin/sh: line 0: type: git: not found
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

DeepSpeed general environment info:DeepSpeed general environment info:
torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']


torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

torch versiontorch version  ........................................  1.8.11.8.1

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
DeepSpeed general environment info:DeepSpeed general environment info:
torch version .................... 1.8.1

torch cuda version ............... 11.1
torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
torch versiontorch version  ........................................  1.8.11.8.1
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1

torch cuda version torch cuda version...............  ...............11.1 
11.1nvcc version
 nvcc version.....................  .....................11.2 
11.2
deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

ninjaninjaninjaninja   .................................... ..................  ..................[92m[OKAY][0m[92m[OKAY][0m 

[92m[OKAY][0m [92m[OKAY][0m

------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------
op name
op name op name ................op name ................   installed................installed ................   ....installed  installed compatible compatible
....
 -------------------------------------------------- --------------------------------------------------

compatiblecompatible

----------------------------------------------------------------------------------------------------

cpu_adam cpu_adam............... cpu_adam cpu_adam ...............[92m[YES][0m  ............... ...............[92m[YES][0m ...... [92m[YES][0m [92m[YES][0m ...... [92m[OKAY][0m  ......
[92m[OKAY][0m...... 
 [92m[OKAY][0m[92m[OKAY][0m

fused_adam .............fused_adamfused_adam  [93m[NO][0m ............. fused_adam....................    .............[93m[NO][0m[93m[NO][0m[92m[OKAY][0m   [93m[NO][0m
.......  [92m[OKAY][0m
..............fused_lamb   fused_lamb[92m[OKAY][0m.............[92m[OKAY][0m 
 
.............[93m[NO][0m  fused_lamb.......fused_lamb[93m[NO][0m    [92m[OKAY][0m....................
.............   [93m[NO][0m[92m[OKAY][0m [93m[NO][0m
.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... sparse_attn[92m[OKAY][0m
 ............sparse_attntransformer sparse_attn  [93m[NO][0m ........................  ................... [93m[NO][0m [93m[NO][0m  [93m[NO][0m .......[92m[OKAY][0m .......
 ....... [92m[OKAY][0m [92m[OKAY][0m
transformer[92m[OKAY][0m
 
............ transformerstochastic_transformer[93m[NO][0m transformer ............   ....................  [93m[NO][0m  [92m[OKAY][0m[93m[NO][0m[93m[NO][0m
.......   ..............[92m[OKAY][0m stochastic_transformer[92m[OKAY][0m
  
[92m[OKAY][0m
.stochastic_transformer  [93m[NO][0mstochastic_transformer . .......  [93m[NO][0m.[92m[OKAY][0m  
.......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
torch version .................... 1.8.1
torch cuda version ............... 11.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
DeepSpeed general environment info:torch version 
.................... 1.8.1
torch cuda versiontorch install path  ..............................  11.1
nvcc version ..................... 11.2['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

deepspeed install pathtorch version  ...............................  1.8.1['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infotorch cuda version  ..................................  0.4.2+bc17042, bc17042, big-science11.1

deepspeed wheel compiled w.nvcc version  ...........................  torch 1.8, cuda 11.111.2

deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install pathDeepSpeed general environment info: ...............DeepSpeed general environment info: 

torch install path ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']...............torch install path 
 ............... torch version .................... 1.8.1['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch cuda version torch version...............torch version   11.1........................................
  1.8.1nvcc version1.8.1
 
.....................torch cuda version torch cuda version 11.2 ...............
............... deepspeed install path 11.1 11.1
...........
 nvcc versionnvcc version  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']..........................................
/bin/sh: line 0: type: git: not found
  deepspeed info11.211.2 

...................deepspeed install path deepspeed install path 0.4.2+bc17042, bc17042, big-science ...................... 
 deepspeed wheel compiled w.['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
......
deepspeed info deepspeed info  torch 1.8, cuda 11.1......................................
  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
DeepSpeed general environment info:
torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 
.................... 1.8.1
torch version ....................torch cuda version  1.8.1...............
 11.1
torch cuda version nvcc version...............  .....................11.1 
11.2nvcc version
 deepspeed install path.....................  ...........11.2 
deepspeed install path['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
........... deepspeed info ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']...................
 deepspeed info 0.4.2+bc17042, bc17042, big-science...................
 deepspeed wheel compiled w.0.4.2+bc17042, bc17042, big-science 
...... deepspeed wheel compiled w.torch 1.8, cuda 11.1 
...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.quantizer ..............
 [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
DeepSpeed general environment info:
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch version .................... 1.8.1
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch cuda version ............... 11.1
torch version .................... 1.8.1
nvcc version ..................... 11.2
torch cuda version ............... 11.1
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
nvcc version ..................... 11.2
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... DeepSpeed general environment info:
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch install pathtorch version  ...................................  1.8.1
torch cuda version ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']11.1

nvcc version ..................... torch version11.2 
....................deepspeed install path  1.8.1...........
 torch cuda version['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
...............deepspeed info  11.1...................
 0.4.2+bc17042, bc17042, big-sciencenvcc version
 .....................deepspeed wheel compiled w.  11.2......
 deepspeed install pathtorch 1.8, cuda 11.1 
........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path ...............  ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version ....................torch version  1.8.1....................
 1.8.1
torch cuda version ...............torch cuda version  11.1...............
 nvcc version11.1 
.....................nvcc version  11.2.....................
 deepspeed install path11.2 
...........deepspeed install path  ...........['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info
 ...................deepspeed info  0.4.2+bc17042, bc17042, big-science...................
 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w.
 ......deepspeed wheel compiled w.  torch 1.8, cuda 11.1......
 torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... DeepSpeed general environment info:
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch install pathtorch version  ...................................  1.8.1
torch cuda version ...............['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 
11.1
nvcc versiontorch version  .........................................  11.21.8.1

deepspeed install path torch cuda version...........  ............... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']11.1

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.


deepspeed infonvcc version  ........................................  0.4.2+bc17042, bc17042, big-science11.2

deepspeed wheel compiled w.deepspeed install path  .................  torch 1.8, cuda 11.1
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
async_ioasync_io  async_io..............................   [93m[NO][0m...............[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [93m[NO][0m [93m[NO][0m ....... ....... ....... [93m[NO][0m
 
[93m[NO][0m[93m[NO][0m

deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
transformer_inferencetransformer_inference  ....  transformer_inference[93m[NO][0m[93m[NO][0m   ................async_io    [92m[OKAY][0m[92m[OKAY][0m[93m[NO][0m...............

  [93m[NO][0m.......  .......[92m[OKAY][0m utils
[93m[NO][0m utils
 ....................................  [92m[YES][0m[92m[YES][0m  utils............   ..................[92m[OKAY][0m[92m[OKAY][0m 

[92m[YES][0m ...... transformer_inference[92m[OKAY][0m quantizer
.. quantizer .............. [93m[NO][0m  ..............[93m[NO][0mquantizer.......    [93m[NO][0m.....................[92m[OKAY][0m   .......
[92m[OKAY][0m[93m[NO][0m 
 [92m[OKAY][0m
....... [92m[OKAY][0m
utils---------------------------------------------------------------------------------------------------- 

..................-------------------------------------------------- 
[92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
torch cuda version ............... 11.1
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.async_io 
............... [93m[NO][0m ....... [93m[NO][0m
async_iotransformer_inference  .................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[92m[OKAY][0m

utilstransformer_inference  ....................  [92m[YES][0m[93m[NO][0m  .............  [92m[OKAY][0m[92m[OKAY][0m

quantizer .............. [93m[NO][0m utils.......  ..................[92m[OKAY][0m 
[92m[YES][0m ......-------------------------------------------------- 
[92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path DeepSpeed general environment info:........... 
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ...................torch install path 0.4.2+bc17042, bc17042, big-science 
............... deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

--------------------------------------------------
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:DeepSpeed general environment info:

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
DeepSpeed general environment info:torch install path
 ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']....................
 1.8.1
torch version torch cuda version....................  ...............1.8.1 
11.1
torch cuda versionnvcc version  ....................................  11.111.2

nvcc versiondeepspeed install path  ................................ 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w.  ...... torch 1.8, cuda 11.1['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... DeepSpeed general environment info:['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch version ....................torch install path  1.8.1...............
 torch cuda version ............... 11.1
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']nvcc version
 ..................... 11.2torch version
 deepspeed install path....................  ...........1.8.1 
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']torch cuda version
 deepspeed info...............  ...................11.1 
nvcc version0.4.2+bc17042, bc17042, big-science 
.....................deepspeed wheel compiled w.  11.2......
 deepspeed install pathtorch 1.8, cuda 11.1 
........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m utils.......  ..................[92m[OKAY][0m 
[92m[YES][0m ...... [92m[OKAY][0m
utils .................. [92m[YES][0mquantizer  ....................  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
quantizer --------------------------------------------------..............
 [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version ...............  ...............11.1 
11.1nvcc version
 nvcc version.....................  .....................11.2 
11.2deepspeed install path
 deepspeed install path...........  ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info deepspeed info...................  ...................0.4.2+bc17042, bc17042, big-science 
deepspeed wheel compiled w.0.4.2+bc17042, bc17042, big-science 
......deepspeed wheel compiled w.  torch 1.8, cuda 11.1......
 torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
----------------------------------------------------------------------------------------------------
DeepSpeed C++/CUDA extension op report

DeepSpeed C++/CUDA extension op report----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------


--------------------------------------------------DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


JIT compiled ops requires ninja--------------------------------------------------
--------------------------------------------------

JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------
JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0masync_io  ...................... [93m[NO][0m 
[93m[NO][0m ....... [93m[NO][0m
transformer_inference .. transformer_inference[93m[NO][0m  .........  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
utils .................. [92m[YES][0mutils  ........................  [92m[OKAY][0m[92m[YES][0m
 ...... [92m[OKAY][0m
quantizer ..............quantizer  [93m[NO][0m..............  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
ninjaninjaninjaninja    .................................... .................................... [92m[OKAY][0m  [92m[OKAY][0m
[92m[OKAY][0m[92m[OKAY][0m


----------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------


op nameop name op nameop name  ................ ................................ ................   installedinstalledinstalledinstalled    ........    compatiblecompatiblecompatiblecompatible


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


cpu_adamcpu_adam cpu_adamcpu_adam  .............................. ...............  ............... [92m[YES][0m[92m[YES][0m [92m[YES][0m  [92m[YES][0m ............ ............    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


fused_adamfused_adamfused_adam  fused_adam............. .............  ............. .............[93m[NO][0m[93m[NO][0m    [93m[NO][0m[93m[NO][0m..............    ..............[92m[OKAY][0m[92m[OKAY][0m  

[92m[OKAY][0m[92m[OKAY][0m

fused_lambfused_lambfused_lamb   ..........................fused_lamb.............  [93m[NO][0m  [93m[NO][0m [93m[NO][0m............. .......  ....... .......[93m[NO][0m [92m[OKAY][0m  [92m[OKAY][0m[92m[OKAY][0m
.......

 [92m[OKAY][0m
sparse_attnsparse_attnsparse_attn   ....................................   [93m[NO][0msparse_attn[93m[NO][0m[93m[NO][0m   ....... ............ .............. [92m[OKAY][0m  [93m[NO][0m
[92m[OKAY][0m[92m[OKAY][0m 

....... transformer[92m[OKAY][0m transformer
transformer............   ........................[93m[NO][0m   [93m[NO][0m[93m[NO][0m.......transformer    ..............[92m[OKAY][0m............  
 [92m[OKAY][0m[92m[OKAY][0m[93m[NO][0m

 ....... stochastic_transformer[92m[OKAY][0m 
stochastic_transformerstochastic_transformer .  [93m[NO][0m..  stochastic_transformer .......[93m[NO][0m  [93m[NO][0m [92m[OKAY][0m........
   .......[93m[NO][0m[92m[OKAY][0m  
[92m[OKAY][0m.......
 [92m[OKAY][0m
DeepSpeed general environment info:
DeepSpeed general environment info:torch install path ...............
 torch install pathDeepSpeed general environment info: ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']...............

 torch version ....................torch install path 1.8.1['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 

............... torch cuda versiontorch version  ...................................  11.11.8.1
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
nvcc version
 torch cuda version..................... torch version ............... 11.2 ....................
11.1 
deepspeed install path1.8.1nvcc version 
 ................................  torch cuda version ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']11.2...............

 deepspeed info11.1deepspeed install path 
 ..............................nvcc version   0.4.2+bc17042, bc17042, big-science.....................
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed wheel compiled w.11.2
 
deepspeed info......deepspeed install path   ...................torch 1.8, cuda 11.1........... 
 0.4.2+bc17042, bc17042, big-science
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed wheel compiled w.
 deepspeed info......  ...................torch 1.8, cuda 11.1 
0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op report--------------------------------------------------

JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report
--------------------------------------------------


DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------


----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------


--------------------------------------------------JIT compiled ops requires ninja

JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

ninjaninjaninjaninja   .................................... ..................   ..................[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m 


[92m[OKAY][0m----------------------------------------------------------------------------------------------------
--------------------------------------------------

op name
 op name--------------------------------------------------
................ op name op name................installed    .................................. installed installed  installed ..compatible..  
 ..compatible--------------------------------------------------compatible 


compatible----------------------------------------------------------------------------------------------------


--------------------------------------------------
cpu_adam ............... [92m[YES][0mcpu_adam cpu_adam......  cpu_adam..............................    [92m[OKAY][0m[92m[YES][0m[92m[YES][0m............... 
  ............ [92m[YES][0m [92m[OKAY][0m [92m[OKAY][0m
......
 fused_adam[92m[OKAY][0m 
............. [93m[NO][0m ....... [92m[OKAY][0mfused_adam
fused_adam  ..........................  fused_lamb[93m[NO][0m[93m[NO][0m  fused_adam ....................  ....... [92m[OKAY][0m.............
 [93m[NO][0m [92m[OKAY][0m fused_lamb [93m[NO][0m
....................   [92m[OKAY][0mfused_lamb[93m[NO][0m
  ...........................   [93m[NO][0m[92m[OKAY][0m
[92m[OKAY][0m .......
 [92m[OKAY][0m
fused_lambsparse_attn  .........................  [93m[NO][0msparse_attn [93m[NO][0m ....... ............  .......[92m[OKAY][0msparse_attn[93m[NO][0m
   [92m[OKAY][0m.......transformer............ 
 [92m[OKAY][0m ............[93m[NO][0m
  [93m[NO][0m....... transformer ....... [92m[OKAY][0m ............
[92m[OKAY][0m 
[93m[NO][0mtransformer  ...................  sparse_attn[92m[OKAY][0m[93m[NO][0mstochastic_transformer
   ...................stochastic_transformer . [92m[OKAY][0m  
[93m[NO][0m.[93m[NO][0m  .......stochastic_transformer  [92m[OKAY][0m[93m[NO][0m 
 ...............   [92m[OKAY][0m[92m[OKAY][0m[93m[NO][0m

 ....... [92m[OKAY][0mtransformer
 ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path DeepSpeed general environment info:...............DeepSpeed general environment info: DeepSpeed general environment info:


['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch install pathtorch install path
torch install path   ..............................torch version ...............   .................... 1.8.1
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch cuda version['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
 

............... 11.1torch versiontorch versiontorch version
   ....................nvcc version........................................    1.8.1.....................1.8.11.8.1
 

11.2torch cuda version
torch cuda versiontorch cuda version  deepspeed install path............... ...............  ............... ...........11.1 11.1 
11.1
nvcc version
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']nvcc version nvcc version .....................
 ..................... deepspeed info.....................  11.2 11.211.2
...................

deepspeed install path deepspeed install pathdeepspeed install path   0.4.2+bc17042, bc17042, big-science.................................
   deepspeed wheel compiled w. ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']......


 deepspeed infodeepspeed infodeepspeed infotorch 1.8, cuda 11.1   
.........................................................   0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science
0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w. deepspeed wheel compiled w. ...... ...... ...... torch 1.8, cuda 11.1 torch 1.8, cuda 11.1
torch 1.8, cuda 11.1

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

--------------------------------------------------
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

--------------------------------------------------JIT compiled ops requires ninja--------------------------------------------------
--------------------------------------------------

DeepSpeed C++/CUDA extension op report
JIT compiled ops requires ninja
DeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninjaJIT compiled ops requires ninja

ninjaninjaninjaninja   .................. .................................... .................. [92m[OKAY][0m [92m[OKAY][0m 
[92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
op nameop name
 op nameop name  ................ ................................ ................ installed  installed installedinstalled ..   ....compatible..   
compatiblecompatible--------------------------------------------------compatible


----------------------------------------------------------------------------------------------------
--------------------------------------------------

cpu_adam ............... [92m[YES][0mcpu_adamcpu_adam cpu_adam  ...... ..............................  ............... [92m[YES][0m[92m[OKAY][0m  [92m[YES][0m
......[92m[YES][0m   [92m[OKAY][0m............
  [92m[OKAY][0m[92m[OKAY][0m

fused_adam ............. [93m[NO][0m ....... fused_adam[92m[OKAY][0m 
.............fused_adamfused_adam   [93m[NO][0mfused_lamb..........................    ....................[93m[NO][0m[93m[NO][0m    [92m[OKAY][0m..............
[93m[NO][0m   [92m[OKAY][0m[92m[OKAY][0m.......
fused_lamb
  [92m[OKAY][0m.............
 fused_lambfused_lamb[93m[NO][0m   .................................   [93m[NO][0m[92m[OKAY][0m[93m[NO][0m 
....... sparse_attn ....... [92m[OKAY][0m............ 
 [92m[OKAY][0m[93m[NO][0m 
....... [92m[OKAY][0m
sparse_attn ............transformer  [93m[NO][0m............  sparse_attn[93m[NO][0m.......   sparse_attn............[92m[OKAY][0m.......   
............[92m[OKAY][0m[93m[NO][0m
 transformer [93m[NO][0m .......stochastic_transformer............   [93m[NO][0m [92m[OKAY][0m .......
........   [92m[OKAY][0mtransformer[92m[OKAY][0m[93m[NO][0m
 
 ................... transformer [93m[NO][0m [92m[OKAY][0m ............stochastic_transformer
.......   [93m[NO][0m[92m[OKAY][0m ........
  [93m[NO][0m[92m[OKAY][0m 
.......stochastic_transformer  [92m[OKAY][0mstochastic_transformer
 . [93m[NO][0m.  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
/bin/sh: line 0: type: git: not found
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op report

--------------------------------------------------DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja--------------------------------------------------


--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

JIT compiled ops requires ninja--------------------------------------------------


DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja

--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
ninjaninjaninjaninja   .................. ....................................  .................. [92m[OKAY][0m[92m[OKAY][0m [92m[OKAY][0m

[92m[OKAY][0m

----------------------------------------------------------------------------------------------------
--------------------------------------------------
--------------------------------------------------
op nameop name
  op name................op name................    installed................................installed    installed....installed    compatible..compatible..
 
 --------------------------------------------------compatiblecompatible
--------------------------------------------------


----------------------------------------------------------------------------------------------------

cpu_adam ............... cpu_adam[92m[YES][0m  cpu_adam.....................   cpu_adam...............[92m[YES][0m[92m[OKAY][0m   
.....................[92m[YES][0m   [92m[OKAY][0m[92m[YES][0m......
  ......[92m[OKAY][0m 
[92m[OKAY][0mfused_adam
 ............. [93m[NO][0m ....... [92m[OKAY][0mfused_adam
 ............. [93m[NO][0mfused_adamfused_lamb  ....... fused_adam .......................... [92m[OKAY][0m .............
 [93m[NO][0m [93m[NO][0m  [93m[NO][0m.............. fused_lamb .......  [92m[OKAY][0m[92m[OKAY][0m .............

[92m[OKAY][0m 
[93m[NO][0m fused_lamb.......  .............fused_lamb [92m[OKAY][0m [93m[NO][0m
.............  .......sparse_attn [93m[NO][0m [92m[OKAY][0m ...................
  [93m[NO][0m[92m[OKAY][0m 
.......sparse_attn  [92m[OKAY][0m............
 [93m[NO][0m transformer.......  ............[92m[OKAY][0m 
[93m[NO][0m .......transformer  sparse_attnsparse_attn............[92m[OKAY][0m  
 ............[93m[NO][0m  ............[93m[NO][0mstochastic_transformer ....... [93m[NO][0m  ....... [92m[OKAY][0m ........
[92m[OKAY][0m  
[92m[OKAY][0m[93m[NO][0m
 stochastic_transformertransformer.......   transformer............[92m[OKAY][0m. 
  ............[93m[NO][0m [93m[NO][0m [93m[NO][0m .......  ..............[92m[OKAY][0m 
 [92m[OKAY][0m
[92m[OKAY][0m
stochastic_transformer . stochastic_transformer[93m[NO][0m  ....... .[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path ...............  ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............ torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathDeepSpeed general environment info: torch install path...............  ...............
 torch install path['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']...............
 torch version torch version....................  ....................1.8.1 
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']1.8.1

torch cuda version ...............torch cuda versiontorch version   11.1...................................
  nvcc version11.11.8.1 

.....................nvcc version  11.2torch cuda version.....................
  deepspeed install path...............11.2 
 ...........deepspeed install path 11.1 
...........['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']nvcc version 
 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'].....................deepspeed info
  11.2...................deepspeed info
  deepspeed install path0.4.2+bc17042, bc17042, big-science................... 
 ...........0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. 
 ......['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed wheel compiled w.  
torch 1.8, cuda 11.1......deepspeed info
  torch 1.8, cuda 11.1...................
 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... [93m[NO][0m .......async_io [93m[NO][0m 
............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m transformer_inference.......  ..[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ......utils  [92m[OKAY][0m..................
 [92m[YES][0m ...... [92m[OKAY][0mquantizer
 .............. [93m[NO][0m quantizer.......  ..............[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m--------------------------------------------------

--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:DeepSpeed general environment info:
DeepSpeed general environment info:

torch install pathtorch install path  ..............................torch install path   ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch version
torch version  ........................................torch version   1.8.11.8.1....................

 1.8.1torch cuda versiontorch cuda version
  .............................. torch cuda version 11.1 11.1
...............
 nvcc versionnvcc version11.1  
..........................................  nvcc version11.211.2 

.....................deepspeed install pathdeepspeed install path   11.2......................
  deepspeed install path ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']...........

 deepspeed infodeepspeed info  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']......................................
  deepspeed info0.4.2+bc17042, bc17042, big-science
0.4.2+bc17042, bc17042, big-science 
deepspeed wheel compiled w....................deepspeed wheel compiled w.   0.4.2+bc17042, bc17042, big-science............
  deepspeed wheel compiled w.torch 1.8, cuda 11.1torch 1.8, cuda 11.1 

...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------
DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja--------------------------------------------------


--------------------------------------------------DeepSpeed C++/CUDA extension op report

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja----------------------------------------------------------------------------------------------------


JIT compiled ops requires ninja
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
ninjaninjaninjaninja   ....................................  ..................[92m[OKAY][0m .................. [92m[OKAY][0m
 [92m[OKAY][0m
[92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------

----------------------------------------------------------------------------------------------------op name
op name 
 op name................................op name    ................installedinstalled................   installed installed....    compatible....compatible
 
 --------------------------------------------------compatible--------------------------------------------------


compatible--------------------------------------------------

--------------------------------------------------
cpu_adamcpu_adam  ..............................  [92m[YES][0m[92m[YES][0mcpu_adam  ...... ......cpu_adam...............    [92m[OKAY][0m...............[92m[OKAY][0m[92m[YES][0m
 
 [92m[YES][0m......  ......[92m[OKAY][0m
 [92m[OKAY][0m
fused_adamfused_adam  ..........................  [93m[NO][0m[93m[NO][0m  .......fused_adam.......   [92m[OKAY][0m.............[92m[OKAY][0m

fused_adam  [93m[NO][0mfused_lamb.............fused_lamb    .......[93m[NO][0m..........................    [92m[OKAY][0m.......[93m[NO][0m[93m[NO][0m
   [92m[OKAY][0m..............fused_lamb
   [92m[OKAY][0m[92m[OKAY][0m.............

fused_lamb  [93m[NO][0m ....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0msparse_attn
 sparse_attn............  ............[93m[NO][0m  .......[93m[NO][0m  [92m[OKAY][0m.......
 sparse_attn[92m[OKAY][0m transformer
............  ............transformersparse_attn[93m[NO][0m    [93m[NO][0m...............................    .......[93m[NO][0m[92m[OKAY][0m [93m[NO][0m
  .......[92m[OKAY][0m.......  
transformer[92m[OKAY][0m[92m[OKAY][0m 

............stochastic_transformer transformerstochastic_transformer  [93m[NO][0m .............  . .......[93m[NO][0m[93m[NO][0m    [92m[OKAY][0m[93m[NO][0m..............
   [92m[OKAY][0m.......[92m[OKAY][0m
 
[92m[OKAY][0m
stochastic_transformer stochastic_transformer.  [93m[NO][0m.  .......[93m[NO][0m [92m[OKAY][0m 
....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------
--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninjaJIT compiled ops requires ninja

ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------op name
op name 
 op name................................op name    installedinstalled................................    ..installed..   compatibleinstalledcompatible..
 
 --------------------------------------------------..--------------------------------------------------compatible

 
compatible--------------------------------------------------

--------------------------------------------------
cpu_adamcpu_adam  ..............................  cpu_adam[92m[YES][0m[92m[YES][0m   cpu_adam...........................    [92m[YES][0m[92m[OKAY][0m...............
[92m[OKAY][0m  
......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
fused_adam ............. [93m[NO][0m .......fused_adam  [92m[OKAY][0m.............fused_adam
 fused_adam [93m[NO][0m ............. fused_lamb....................    .............[93m[NO][0m[93m[NO][0m[92m[OKAY][0m 
  .......[93m[NO][0m.......   fused_lamb[92m[OKAY][0m.......[92m[OKAY][0m  

.............[92m[OKAY][0m 
fused_lamb[93m[NO][0m fused_lamb  .................................   [93m[NO][0m[93m[NO][0m[92m[OKAY][0m  
..............sparse_attn   [92m[OKAY][0m[92m[OKAY][0m............

 [93m[NO][0m ....... [92m[OKAY][0m
sparse_attntransformer  ........................  [93m[NO][0m[93m[NO][0m sparse_attn .......sparse_attn .......  ............ [92m[OKAY][0m 
............[93m[NO][0m[92m[OKAY][0m  
[93m[NO][0mtransformer.......   ...................[92m[OKAY][0mstochastic_transformer   
[93m[NO][0m[92m[OKAY][0m .
 .......transformer[93m[NO][0m   [92m[OKAY][0mtransformer.......
............   ............[92m[OKAY][0mstochastic_transformer[93m[NO][0m
   [93m[NO][0m....... . ....... [92m[OKAY][0m [93m[NO][0m
[92m[OKAY][0m 
.......stochastic_transformer [92m[OKAY][0m stochastic_transformer
 . .[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:DeepSpeed general environment info:DeepSpeed general environment info:


torch install pathtorch install pathtorch install path   .............................................   ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']


torch version torch versiontorch version....................   ........................................1.8.1  
1.8.11.8.1

torch cuda version torch cuda version...............torch cuda version   11.1..............................
  nvcc version11.111.1 

.....................nvcc versionnvcc version   11.2..........................................
  deepspeed install path11.211.2 

...........deepspeed install pathdeepspeed install path   ......................['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']  
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
 
deepspeed info...................deepspeed info   ...................0.4.2+bc17042, bc17042, big-science................... 
 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w.0.4.2+bc17042, bc17042, big-science
 
deepspeed wheel compiled w.......deepspeed wheel compiled w.   ......torch 1.8, cuda 11.1...... 
 torch 1.8, cuda 11.1torch 1.8, cuda 11.1

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
----------------------------------------------------------------------------------------------------
DeepSpeed C++/CUDA extension op report

DeepSpeed C++/CUDA extension op report--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

----------------------------------------------------------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja


--------------------------------------------------DeepSpeed C++/CUDA extension op report--------------------------------------------------


--------------------------------------------------JIT compiled ops requires ninja
DeepSpeed C++/CUDA extension op report
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja

--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja    .................................... ....................................  [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m
 

[92m[OKAY][0m------------------------------------------------------------------------------------------------------------------------------------------------------


op nameop nameop name--------------------------------------------------   
................................................   op nameinstalledinstalledinstalled    .................... .. compatible  
compatibleinstalledcompatible--------------------------------------------------
 

..----------------------------------------------------------------------------------------------------

 compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m cpu_adam......cpu_adam   ...............[92m[OKAY][0mcpu_adam
...............   [92m[YES][0m............... [92m[YES][0m ...... [92m[YES][0m ...... [92m[OKAY][0m ......fused_adam
[92m[OKAY][0m  
.............[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m
fused_adamfused_lamb fused_adam ............. ............. ............. fused_adam[93m[NO][0m [93m[NO][0m  [93m[NO][0m .................... ....... .......  [92m[OKAY][0m [93m[NO][0m
[92m[OKAY][0m[92m[OKAY][0m 

....... fused_lamb[92m[OKAY][0mfused_lamb 
 ..........................  [93m[NO][0m[93m[NO][0m fused_lamb.......   .......sparse_attn.............[92m[OKAY][0m   [92m[OKAY][0m
............[93m[NO][0m
  [93m[NO][0m.......  ....... [92m[OKAY][0m[92m[OKAY][0m

transformersparse_attnsparse_attn   ....................................   [93m[NO][0m[93m[NO][0m[93m[NO][0m   .....................   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0msparse_attn


 ............ transformerstochastic_transformer [93m[NO][0mtransformer ............   .................... [93m[NO][0m   [93m[NO][0m[92m[OKAY][0m....... [93m[NO][0m
 ....... ....... [92m[OKAY][0m [92m[OKAY][0m
transformer[92m[OKAY][0m

 ............ stochastic_transformer[93m[NO][0m stochastic_transformer  .........   [92m[OKAY][0m[93m[NO][0m[93m[NO][0m
  ..............  [92m[OKAY][0m[92m[OKAY][0m
stochastic_transformer
 . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
> initializing tensor model parallel with size 4
> initializing pipeline model parallel with size 8
> setting random seeds to 42 ...
[2021-09-27 03:54:33,898] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 2760 and data parallel seed: 42
> compiling dataset index builder ...
make: Entering directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/data'
make: Nothing to be done for 'default'.
make: Leaving directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/data'
>>> done with dataset index builder. Compilation time: 0.303 seconds
> compiling and loading fused kernels ...
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
Detected CUDA files, patching ldflags
Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/fused_kernels/build/build.ninja...
Building extension module scaled_upper_triang_masked_softmax_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module scaled_upper_triang_masked_softmax_cuda...
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
Detected CUDA files, patching ldflags
Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/fused_kernels/build/build.ninja...
Building extension module scaled_masked_softmax_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module scaled_masked_softmax_cuda...
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
Detected CUDA files, patching ldflags
Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/fused_kernels/build/build.ninja...
Building extension module fused_mix_prec_layer_norm_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module fused_mix_prec_layer_norm_cuda...
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
>>> done with compiling and loading fused kernels. Compilation time: 20.521 seconds
time to initialize megatron (seconds): -17.161
[after megatron is initialized] datetime: 2021-09-27 03:54:54 
building GPT model ...
[2021-09-27 03:54:54,901] [INFO] [utils.py:680:see_memory_usage] Before Building Model
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/cuda/memory.py:373: FutureWarning: torch.cuda.memory_cached has been renamed to torch.cuda.memory_reserved
  warnings.warn(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/cuda/memory.py:381: FutureWarning: torch.cuda.max_memory_cached has been renamed to torch.cuda.max_memory_reserved
  warnings.warn(
[2021-09-27 03:54:54,903] [INFO] [utils.py:681:see_memory_usage] MA 0.0 GB         Max_MA 0.0 GB         CA 0.0 GB         Max_CA 0 GB 
[2021-09-27 03:54:54,903] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory:  used = 36.85 GB, percent = 19.7%
SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None
Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=0, model=1): 1, ProcessCoord(pipe=0, data=0, model=2): 2, ProcessCoord(pipe=0, data=0, model=3): 3, ProcessCoord(pipe=0, data=1, model=0): 4, ProcessCoord(pipe=0, data=1, model=1): 5, ProcessCoord(pipe=0, data=1, model=2): 6, ProcessCoord(pipe=0, data=1, model=3): 7, ProcessCoord(pipe=0, data=2, model=0): 8, ProcessCoord(pipe=0, data=2, model=1): 9, ProcessCoord(pipe=0, data=2, model=2): 10, ProcessCoord(pipe=0, data=2, model=3): 11, ProcessCoord(pipe=0, data=3, model=0): 12, ProcessCoord(pipe=0, data=3, model=1): 13, ProcessCoord(pipe=0, data=3, model=2): 14, ProcessCoord(pipe=0, data=3, model=3): 15, ProcessCoord(pipe=0, data=4, model=0): 16, ProcessCoord(pipe=0, data=4, model=1): 17, ProcessCoord(pipe=0, data=4, model=2): 18, ProcessCoord(pipe=0, data=4, model=3): 19, ProcessCoord(pipe=0, data=5, model=0): 20, ProcessCoord(pipe=0, data=5, model=1): 21, ProcessCoord(pipe=0, data=5, model=2): 22, ProcessCoord(pipe=0, data=5, model=3): 23, ProcessCoord(pipe=0, data=6, model=0): 24, ProcessCoord(pipe=0, data=6, model=1): 25, ProcessCoord(pipe=0, data=6, model=2): 26, ProcessCoord(pipe=0, data=6, model=3): 27, ProcessCoord(pipe=0, data=7, model=0): 28, ProcessCoord(pipe=0, data=7, model=1): 29, ProcessCoord(pipe=0, data=7, model=2): 30, ProcessCoord(pipe=0, data=7, model=3): 31, ProcessCoord(pipe=0, data=8, model=0): 32, ProcessCoord(pipe=0, data=8, model=1): 33, ProcessCoord(pipe=0, data=8, model=2): 34, ProcessCoord(pipe=0, data=8, model=3): 35, ProcessCoord(pipe=0, data=9, model=0): 36, ProcessCoord(pipe=0, data=9, model=1): 37, ProcessCoord(pipe=0, data=9, model=2): 38, ProcessCoord(pipe=0, data=9, model=3): 39, ProcessCoord(pipe=0, data=10, model=0): 40, ProcessCoord(pipe=0, data=10, model=1): 41, ProcessCoord(pipe=0, data=10, model=2): 42, ProcessCoord(pipe=0, data=10, model=3): 43, ProcessCoord(pipe=0, data=11, model=0): 44, ProcessCoord(pipe=0, data=11, model=1): 45, ProcessCoord(pipe=0, data=11, model=2): 46, ProcessCoord(pipe=0, data=11, model=3): 47, ProcessCoord(pipe=0, data=12, model=0): 48, ProcessCoord(pipe=0, data=12, model=1): 49, ProcessCoord(pipe=0, data=12, model=2): 50, ProcessCoord(pipe=0, data=12, model=3): 51, ProcessCoord(pipe=0, data=13, model=0): 52, ProcessCoord(pipe=0, data=13, model=1): 53, ProcessCoord(pipe=0, data=13, model=2): 54, ProcessCoord(pipe=0, data=13, model=3): 55, ProcessCoord(pipe=0, data=14, model=0): 56, ProcessCoord(pipe=0, data=14, model=1): 57, ProcessCoord(pipe=0, data=14, model=2): 58, ProcessCoord(pipe=0, data=14, model=3): 59, ProcessCoord(pipe=0, data=15, model=0): 60, ProcessCoord(pipe=0, data=15, model=1): 61, ProcessCoord(pipe=0, data=15, model=2): 62, ProcessCoord(pipe=0, data=15, model=3): 63, ProcessCoord(pipe=1, data=0, model=0): 64, ProcessCoord(pipe=1, data=0, model=1): 65, ProcessCoord(pipe=1, data=0, model=2): 66, ProcessCoord(pipe=1, data=0, model=3): 67, ProcessCoord(pipe=1, data=1, model=0): 68, ProcessCoord(pipe=1, data=1, model=1): 69, ProcessCoord(pipe=1, data=1, model=2): 70, ProcessCoord(pipe=1, data=1, model=3): 71, ProcessCoord(pipe=1, data=2, model=0): 72, ProcessCoord(pipe=1, data=2, model=1): 73, ProcessCoord(pipe=1, data=2, model=2): 74, ProcessCoord(pipe=1, data=2, model=3): 75, ProcessCoord(pipe=1, data=3, model=0): 76, ProcessCoord(pipe=1, data=3, model=1): 77, ProcessCoord(pipe=1, data=3, model=2): 78, ProcessCoord(pipe=1, data=3, model=3): 79, ProcessCoord(pipe=1, data=4, model=0): 80, ProcessCoord(pipe=1, data=4, model=1): 81, ProcessCoord(pipe=1, data=4, model=2): 82, ProcessCoord(pipe=1, data=4, model=3): 83, ProcessCoord(pipe=1, data=5, model=0): 84, ProcessCoord(pipe=1, data=5, model=1): 85, ProcessCoord(pipe=1, data=5, model=2): 86, ProcessCoord(pipe=1, data=5, model=3): 87, ProcessCoord(pipe=1, data=6, model=0): 88, ProcessCoord(pipe=1, data=6, model=1): 89, ProcessCoord(pipe=1, data=6, model=2): 90, ProcessCoord(pipe=1, data=6, model=3): 91, ProcessCoord(pipe=1, data=7, model=0): 92, ProcessCoord(pipe=1, data=7, model=1): 93, ProcessCoord(pipe=1, data=7, model=2): 94, ProcessCoord(pipe=1, data=7, model=3): 95, ProcessCoord(pipe=1, data=8, model=0): 96, ProcessCoord(pipe=1, data=8, model=1): 97, ProcessCoord(pipe=1, data=8, model=2): 98, ProcessCoord(pipe=1, data=8, model=3): 99, ProcessCoord(pipe=1, data=9, model=0): 100, ProcessCoord(pipe=1, data=9, model=1): 101, ProcessCoord(pipe=1, data=9, model=2): 102, ProcessCoord(pipe=1, data=9, model=3): 103, ProcessCoord(pipe=1, data=10, model=0): 104, ProcessCoord(pipe=1, data=10, model=1): 105, ProcessCoord(pipe=1, data=10, model=2): 106, ProcessCoord(pipe=1, data=10, model=3): 107, ProcessCoord(pipe=1, data=11, model=0): 108, ProcessCoord(pipe=1, data=11, model=1): 109, ProcessCoord(pipe=1, data=11, model=2): 110, ProcessCoord(pipe=1, data=11, model=3): 111, ProcessCoord(pipe=1, data=12, model=0): 112, ProcessCoord(pipe=1, data=12, model=1): 113, ProcessCoord(pipe=1, data=12, model=2): 114, ProcessCoord(pipe=1, data=12, model=3): 115, ProcessCoord(pipe=1, data=13, model=0): 116, ProcessCoord(pipe=1, data=13, model=1): 117, ProcessCoord(pipe=1, data=13, model=2): 118, ProcessCoord(pipe=1, data=13, model=3): 119, ProcessCoord(pipe=1, data=14, model=0): 120, ProcessCoord(pipe=1, data=14, model=1): 121, ProcessCoord(pipe=1, data=14, model=2): 122, ProcessCoord(pipe=1, data=14, model=3): 123, ProcessCoord(pipe=1, data=15, model=0): 124, ProcessCoord(pipe=1, data=15, model=1): 125, ProcessCoord(pipe=1, data=15, model=2): 126, ProcessCoord(pipe=1, data=15, model=3): 127, ProcessCoord(pipe=2, data=0, model=0): 128, ProcessCoord(pipe=2, data=0, model=1): 129, ProcessCoord(pipe=2, data=0, model=2): 130, ProcessCoord(pipe=2, data=0, model=3): 131, ProcessCoord(pipe=2, data=1, model=0): 132, ProcessCoord(pipe=2, data=1, model=1): 133, ProcessCoord(pipe=2, data=1, model=2): 134, ProcessCoord(pipe=2, data=1, model=3): 135, ProcessCoord(pipe=2, data=2, model=0): 136, ProcessCoord(pipe=2, data=2, model=1): 137, ProcessCoord(pipe=2, data=2, model=2): 138, ProcessCoord(pipe=2, data=2, model=3): 139, ProcessCoord(pipe=2, data=3, model=0): 140, ProcessCoord(pipe=2, data=3, model=1): 141, ProcessCoord(pipe=2, data=3, model=2): 142, ProcessCoord(pipe=2, data=3, model=3): 143, ProcessCoord(pipe=2, data=4, model=0): 144, ProcessCoord(pipe=2, data=4, model=1): 145, ProcessCoord(pipe=2, data=4, model=2): 146, ProcessCoord(pipe=2, data=4, model=3): 147, ProcessCoord(pipe=2, data=5, model=0): 148, ProcessCoord(pipe=2, data=5, model=1): 149, ProcessCoord(pipe=2, data=5, model=2): 150, ProcessCoord(pipe=2, data=5, model=3): 151, ProcessCoord(pipe=2, data=6, model=0): 152, ProcessCoord(pipe=2, data=6, model=1): 153, ProcessCoord(pipe=2, data=6, model=2): 154, ProcessCoord(pipe=2, data=6, model=3): 155, ProcessCoord(pipe=2, data=7, model=0): 156, ProcessCoord(pipe=2, data=7, model=1): 157, ProcessCoord(pipe=2, data=7, model=2): 158, ProcessCoord(pipe=2, data=7, model=3): 159, ProcessCoord(pipe=2, data=8, model=0): 160, ProcessCoord(pipe=2, data=8, model=1): 161, ProcessCoord(pipe=2, data=8, model=2): 162, ProcessCoord(pipe=2, data=8, model=3): 163, ProcessCoord(pipe=2, data=9, model=0): 164, ProcessCoord(pipe=2, data=9, model=1): 165, ProcessCoord(pipe=2, data=9, model=2): 166, ProcessCoord(pipe=2, data=9, model=3): 167, ProcessCoord(pipe=2, data=10, model=0): 168, ProcessCoord(pipe=2, data=10, model=1): 169, ProcessCoord(pipe=2, data=10, model=2): 170, ProcessCoord(pipe=2, data=10, model=3): 171, ProcessCoord(pipe=2, data=11, model=0): 172, ProcessCoord(pipe=2, data=11, model=1): 173, ProcessCoord(pipe=2, data=11, model=2): 174, ProcessCoord(pipe=2, data=11, model=3): 175, ProcessCoord(pipe=2, data=12, model=0): 176, ProcessCoord(pipe=2, data=12, model=1): 177, ProcessCoord(pipe=2, data=12, model=2): 178, ProcessCoord(pipe=2, data=12, model=3): 179, ProcessCoord(pipe=2, data=13, model=0): 180, ProcessCoord(pipe=2, data=13, model=1): 181, ProcessCoord(pipe=2, data=13, model=2): 182, ProcessCoord(pipe=2, data=13, model=3): 183, ProcessCoord(pipe=2, data=14, model=0): 184, ProcessCoord(pipe=2, data=14, model=1): 185, ProcessCoord(pipe=2, data=14, model=2): 186, ProcessCoord(pipe=2, data=14, model=3): 187, ProcessCoord(pipe=2, data=15, model=0): 188, ProcessCoord(pipe=2, data=15, model=1): 189, ProcessCoord(pipe=2, data=15, model=2): 190, ProcessCoord(pipe=2, data=15, model=3): 191, ProcessCoord(pipe=3, data=0, model=0): 192, ProcessCoord(pipe=3, data=0, model=1): 193, ProcessCoord(pipe=3, data=0, model=2): 194, ProcessCoord(pipe=3, data=0, model=3): 195, ProcessCoord(pipe=3, data=1, model=0): 196, ProcessCoord(pipe=3, data=1, model=1): 197, ProcessCoord(pipe=3, data=1, model=2): 198, ProcessCoord(pipe=3, data=1, model=3): 199, ProcessCoord(pipe=3, data=2, model=0): 200, ProcessCoord(pipe=3, data=2, model=1): 201, ProcessCoord(pipe=3, data=2, model=2): 202, ProcessCoord(pipe=3, data=2, model=3): 203, ProcessCoord(pipe=3, data=3, model=0): 204, ProcessCoord(pipe=3, data=3, model=1): 205, ProcessCoord(pipe=3, data=3, model=2): 206, ProcessCoord(pipe=3, data=3, model=3): 207, ProcessCoord(pipe=3, data=4, model=0): 208, ProcessCoord(pipe=3, data=4, model=1): 209, ProcessCoord(pipe=3, data=4, model=2): 210, ProcessCoord(pipe=3, data=4, model=3): 211, ProcessCoord(pipe=3, data=5, model=0): 212, ProcessCoord(pipe=3, data=5, model=1): 213, ProcessCoord(pipe=3, data=5, model=2): 214, ProcessCoord(pipe=3, data=5, model=3): 215, ProcessCoord(pipe=3, data=6, model=0): 216, ProcessCoord(pipe=3, data=6, model=1): 217, ProcessCoord(pipe=3, data=6, model=2): 218, ProcessCoord(pipe=3, data=6, model=3): 219, ProcessCoord(pipe=3, data=7, model=0): 220, ProcessCoord(pipe=3, data=7, model=1): 221, ProcessCoord(pipe=3, data=7, model=2): 222, ProcessCoord(pipe=3, data=7, model=3): 223, ProcessCoord(pipe=3, data=8, model=0): 224, ProcessCoord(pipe=3, data=8, model=1): 225, ProcessCoord(pipe=3, data=8, model=2): 226, ProcessCoord(pipe=3, data=8, model=3): 227, ProcessCoord(pipe=3, data=9, model=0): 228, ProcessCoord(pipe=3, data=9, model=1): 229, ProcessCoord(pipe=3, data=9, model=2): 230, ProcessCoord(pipe=3, data=9, model=3): 231, ProcessCoord(pipe=3, data=10, model=0): 232, ProcessCoord(pipe=3, data=10, model=1): 233, ProcessCoord(pipe=3, data=10, model=2): 234, ProcessCoord(pipe=3, data=10, model=3): 235, ProcessCoord(pipe=3, data=11, model=0): 236, ProcessCoord(pipe=3, data=11, model=1): 237, ProcessCoord(pipe=3, data=11, model=2): 238, ProcessCoord(pipe=3, data=11, model=3): 239, ProcessCoord(pipe=3, data=12, model=0): 240, ProcessCoord(pipe=3, data=12, model=1): 241, ProcessCoord(pipe=3, data=12, model=2): 242, ProcessCoord(pipe=3, data=12, model=3): 243, ProcessCoord(pipe=3, data=13, model=0): 244, ProcessCoord(pipe=3, data=13, model=1): 245, ProcessCoord(pipe=3, data=13, model=2): 246, ProcessCoord(pipe=3, data=13, model=3): 247, ProcessCoord(pipe=3, data=14, model=0): 248, ProcessCoord(pipe=3, data=14, model=1): 249, ProcessCoord(pipe=3, data=14, model=2): 250, ProcessCoord(pipe=3, data=14, model=3): 251, ProcessCoord(pipe=3, data=15, model=0): 252, ProcessCoord(pipe=3, data=15, model=1): 253, ProcessCoord(pipe=3, data=15, model=2): 254, ProcessCoord(pipe=3, data=15, model=3): 255, ProcessCoord(pipe=4, data=0, model=0): 256, ProcessCoord(pipe=4, data=0, model=1): 257, ProcessCoord(pipe=4, data=0, model=2): 258, ProcessCoord(pipe=4, data=0, model=3): 259, ProcessCoord(pipe=4, data=1, model=0): 260, ProcessCoord(pipe=4, data=1, model=1): 261, ProcessCoord(pipe=4, data=1, model=2): 262, ProcessCoord(pipe=4, data=1, model=3): 263, ProcessCoord(pipe=4, data=2, model=0): 264, ProcessCoord(pipe=4, data=2, model=1): 265, ProcessCoord(pipe=4, data=2, model=2): 266, ProcessCoord(pipe=4, data=2, model=3): 267, ProcessCoord(pipe=4, data=3, model=0): 268, ProcessCoord(pipe=4, data=3, model=1): 269, ProcessCoord(pipe=4, data=3, model=2): 270, ProcessCoord(pipe=4, data=3, model=3): 271, ProcessCoord(pipe=4, data=4, model=0): 272, ProcessCoord(pipe=4, data=4, model=1): 273, ProcessCoord(pipe=4, data=4, model=2): 274, ProcessCoord(pipe=4, data=4, model=3): 275, ProcessCoord(pipe=4, data=5, model=0): 276, ProcessCoord(pipe=4, data=5, model=1): 277, ProcessCoord(pipe=4, data=5, model=2): 278, ProcessCoord(pipe=4, data=5, model=3): 279, ProcessCoord(pipe=4, data=6, model=0): 280, ProcessCoord(pipe=4, data=6, model=1): 281, ProcessCoord(pipe=4, data=6, model=2): 282, ProcessCoord(pipe=4, data=6, model=3): 283, ProcessCoord(pipe=4, data=7, model=0): 284, ProcessCoord(pipe=4, data=7, model=1): 285, ProcessCoord(pipe=4, data=7, model=2): 286, ProcessCoord(pipe=4, data=7, model=3): 287, ProcessCoord(pipe=4, data=8, model=0): 288, ProcessCoord(pipe=4, data=8, model=1): 289, ProcessCoord(pipe=4, data=8, model=2): 290, ProcessCoord(pipe=4, data=8, model=3): 291, ProcessCoord(pipe=4, data=9, model=0): 292, ProcessCoord(pipe=4, data=9, model=1): 293, ProcessCoord(pipe=4, data=9, model=2): 294, ProcessCoord(pipe=4, data=9, model=3): 295, ProcessCoord(pipe=4, data=10, model=0): 296, ProcessCoord(pipe=4, data=10, model=1): 297, ProcessCoord(pipe=4, data=10, model=2): 298, ProcessCoord(pipe=4, data=10, model=3): 299, ProcessCoord(pipe=4, data=11, model=0): 300, ProcessCoord(pipe=4, data=11, model=1): 301, ProcessCoord(pipe=4, data=11, model=2): 302, ProcessCoord(pipe=4, data=11, model=3): 303, ProcessCoord(pipe=4, data=12, model=0): 304, ProcessCoord(pipe=4, data=12, model=1): 305, ProcessCoord(pipe=4, data=12, model=2): 306, ProcessCoord(pipe=4, data=12, model=3): 307, ProcessCoord(pipe=4, data=13, model=0): 308, ProcessCoord(pipe=4, data=13, model=1): 309, ProcessCoord(pipe=4, data=13, model=2): 310, ProcessCoord(pipe=4, data=13, model=3): 311, ProcessCoord(pipe=4, data=14, model=0): 312, ProcessCoord(pipe=4, data=14, model=1): 313, ProcessCoord(pipe=4, data=14, model=2): 314, ProcessCoord(pipe=4, data=14, model=3): 315, ProcessCoord(pipe=4, data=15, model=0): 316, ProcessCoord(pipe=4, data=15, model=1): 317, ProcessCoord(pipe=4, data=15, model=2): 318, ProcessCoord(pipe=4, data=15, model=3): 319, ProcessCoord(pipe=5, data=0, model=0): 320, ProcessCoord(pipe=5, data=0, model=1): 321, ProcessCoord(pipe=5, data=0, model=2): 322, ProcessCoord(pipe=5, data=0, model=3): 323, ProcessCoord(pipe=5, data=1, model=0): 324, ProcessCoord(pipe=5, data=1, model=1): 325, ProcessCoord(pipe=5, data=1, model=2): 326, ProcessCoord(pipe=5, data=1, model=3): 327, ProcessCoord(pipe=5, data=2, model=0): 328, ProcessCoord(pipe=5, data=2, model=1): 329, ProcessCoord(pipe=5, data=2, model=2): 330, ProcessCoord(pipe=5, data=2, model=3): 331, ProcessCoord(pipe=5, data=3, model=0): 332, ProcessCoord(pipe=5, data=3, model=1): 333, ProcessCoord(pipe=5, data=3, model=2): 334, ProcessCoord(pipe=5, data=3, model=3): 335, ProcessCoord(pipe=5, data=4, model=0): 336, ProcessCoord(pipe=5, data=4, model=1): 337, ProcessCoord(pipe=5, data=4, model=2): 338, ProcessCoord(pipe=5, data=4, model=3): 339, ProcessCoord(pipe=5, data=5, model=0): 340, ProcessCoord(pipe=5, data=5, model=1): 341, ProcessCoord(pipe=5, data=5, model=2): 342, ProcessCoord(pipe=5, data=5, model=3): 343, ProcessCoord(pipe=5, data=6, model=0): 344, ProcessCoord(pipe=5, data=6, model=1): 345, ProcessCoord(pipe=5, data=6, model=2): 346, ProcessCoord(pipe=5, data=6, model=3): 347, ProcessCoord(pipe=5, data=7, model=0): 348, ProcessCoord(pipe=5, data=7, model=1): 349, ProcessCoord(pipe=5, data=7, model=2): 350, ProcessCoord(pipe=5, data=7, model=3): 351, ProcessCoord(pipe=5, data=8, model=0): 352, ProcessCoord(pipe=5, data=8, model=1): 353, ProcessCoord(pipe=5, data=8, model=2): 354, ProcessCoord(pipe=5, data=8, model=3): 355, ProcessCoord(pipe=5, data=9, model=0): 356, ProcessCoord(pipe=5, data=9, model=1): 357, ProcessCoord(pipe=5, data=9, model=2): 358, ProcessCoord(pipe=5, data=9, model=3): 359, ProcessCoord(pipe=5, data=10, model=0): 360, ProcessCoord(pipe=5, data=10, model=1): 361, ProcessCoord(pipe=5, data=10, model=2): 362, ProcessCoord(pipe=5, data=10, model=3): 363, ProcessCoord(pipe=5, data=11, model=0): 364, ProcessCoord(pipe=5, data=11, model=1): 365, ProcessCoord(pipe=5, data=11, model=2): 366, ProcessCoord(pipe=5, data=11, model=3): 367, ProcessCoord(pipe=5, data=12, model=0): 368, ProcessCoord(pipe=5, data=12, model=1): 369, ProcessCoord(pipe=5, data=12, model=2): 370, ProcessCoord(pipe=5, data=12, model=3): 371, ProcessCoord(pipe=5, data=13, model=0): 372, ProcessCoord(pipe=5, data=13, model=1): 373, ProcessCoord(pipe=5, data=13, model=2): 374, ProcessCoord(pipe=5, data=13, model=3): 375, ProcessCoord(pipe=5, data=14, model=0): 376, ProcessCoord(pipe=5, data=14, model=1): 377, ProcessCoord(pipe=5, data=14, model=2): 378, ProcessCoord(pipe=5, data=14, model=3): 379, ProcessCoord(pipe=5, data=15, model=0): 380, ProcessCoord(pipe=5, data=15, model=1): 381, ProcessCoord(pipe=5, data=15, model=2): 382, ProcessCoord(pipe=5, data=15, model=3): 383, ProcessCoord(pipe=6, data=0, model=0): 384, ProcessCoord(pipe=6, data=0, model=1): 385, ProcessCoord(pipe=6, data=0, model=2): 386, ProcessCoord(pipe=6, data=0, model=3): 387, ProcessCoord(pipe=6, data=1, model=0): 388, ProcessCoord(pipe=6, data=1, model=1): 389, ProcessCoord(pipe=6, data=1, model=2): 390, ProcessCoord(pipe=6, data=1, model=3): 391, ProcessCoord(pipe=6, data=2, model=0): 392, ProcessCoord(pipe=6, data=2, model=1): 393, ProcessCoord(pipe=6, data=2, model=2): 394, ProcessCoord(pipe=6, data=2, model=3): 395, ProcessCoord(pipe=6, data=3, model=0): 396, ProcessCoord(pipe=6, data=3, model=1): 397, ProcessCoord(pipe=6, data=3, model=2): 398, ProcessCoord(pipe=6, data=3, model=3): 399, ProcessCoord(pipe=6, data=4, model=0): 400, ProcessCoord(pipe=6, data=4, model=1): 401, ProcessCoord(pipe=6, data=4, model=2): 402, ProcessCoord(pipe=6, data=4, model=3): 403, ProcessCoord(pipe=6, data=5, model=0): 404, ProcessCoord(pipe=6, data=5, model=1): 405, ProcessCoord(pipe=6, data=5, model=2): 406, ProcessCoord(pipe=6, data=5, model=3): 407, ProcessCoord(pipe=6, data=6, model=0): 408, ProcessCoord(pipe=6, data=6, model=1): 409, ProcessCoord(pipe=6, data=6, model=2): 410, ProcessCoord(pipe=6, data=6, model=3): 411, ProcessCoord(pipe=6, data=7, model=0): 412, ProcessCoord(pipe=6, data=7, model=1): 413, ProcessCoord(pipe=6, data=7, model=2): 414, ProcessCoord(pipe=6, data=7, model=3): 415, ProcessCoord(pipe=6, data=8, model=0): 416, ProcessCoord(pipe=6, data=8, model=1): 417, ProcessCoord(pipe=6, data=8, model=2): 418, ProcessCoord(pipe=6, data=8, model=3): 419, ProcessCoord(pipe=6, data=9, model=0): 420, ProcessCoord(pipe=6, data=9, model=1): 421, ProcessCoord(pipe=6, data=9, model=2): 422, ProcessCoord(pipe=6, data=9, model=3): 423, ProcessCoord(pipe=6, data=10, model=0): 424, ProcessCoord(pipe=6, data=10, model=1): 425, ProcessCoord(pipe=6, data=10, model=2): 426, ProcessCoord(pipe=6, data=10, model=3): 427, ProcessCoord(pipe=6, data=11, model=0): 428, ProcessCoord(pipe=6, data=11, model=1): 429, ProcessCoord(pipe=6, data=11, model=2): 430, ProcessCoord(pipe=6, data=11, model=3): 431, ProcessCoord(pipe=6, data=12, model=0): 432, ProcessCoord(pipe=6, data=12, model=1): 433, ProcessCoord(pipe=6, data=12, model=2): 434, ProcessCoord(pipe=6, data=12, model=3): 435, ProcessCoord(pipe=6, data=13, model=0): 436, ProcessCoord(pipe=6, data=13, model=1): 437, ProcessCoord(pipe=6, data=13, model=2): 438, ProcessCoord(pipe=6, data=13, model=3): 439, ProcessCoord(pipe=6, data=14, model=0): 440, ProcessCoord(pipe=6, data=14, model=1): 441, ProcessCoord(pipe=6, data=14, model=2): 442, ProcessCoord(pipe=6, data=14, model=3): 443, ProcessCoord(pipe=6, data=15, model=0): 444, ProcessCoord(pipe=6, data=15, model=1): 445, ProcessCoord(pipe=6, data=15, model=2): 446, ProcessCoord(pipe=6, data=15, model=3): 447, ProcessCoord(pipe=7, data=0, model=0): 448, ProcessCoord(pipe=7, data=0, model=1): 449, ProcessCoord(pipe=7, data=0, model=2): 450, ProcessCoord(pipe=7, data=0, model=3): 451, ProcessCoord(pipe=7, data=1, model=0): 452, ProcessCoord(pipe=7, data=1, model=1): 453, ProcessCoord(pipe=7, data=1, model=2): 454, ProcessCoord(pipe=7, data=1, model=3): 455, ProcessCoord(pipe=7, data=2, model=0): 456, ProcessCoord(pipe=7, data=2, model=1): 457, ProcessCoord(pipe=7, data=2, model=2): 458, ProcessCoord(pipe=7, data=2, model=3): 459, ProcessCoord(pipe=7, data=3, model=0): 460, ProcessCoord(pipe=7, data=3, model=1): 461, ProcessCoord(pipe=7, data=3, model=2): 462, ProcessCoord(pipe=7, data=3, model=3): 463, ProcessCoord(pipe=7, data=4, model=0): 464, ProcessCoord(pipe=7, data=4, model=1): 465, ProcessCoord(pipe=7, data=4, model=2): 466, ProcessCoord(pipe=7, data=4, model=3): 467, ProcessCoord(pipe=7, data=5, model=0): 468, ProcessCoord(pipe=7, data=5, model=1): 469, ProcessCoord(pipe=7, data=5, model=2): 470, ProcessCoord(pipe=7, data=5, model=3): 471, ProcessCoord(pipe=7, data=6, model=0): 472, ProcessCoord(pipe=7, data=6, model=1): 473, ProcessCoord(pipe=7, data=6, model=2): 474, ProcessCoord(pipe=7, data=6, model=3): 475, ProcessCoord(pipe=7, data=7, model=0): 476, ProcessCoord(pipe=7, data=7, model=1): 477, ProcessCoord(pipe=7, data=7, model=2): 478, ProcessCoord(pipe=7, data=7, model=3): 479, ProcessCoord(pipe=7, data=8, model=0): 480, ProcessCoord(pipe=7, data=8, model=1): 481, ProcessCoord(pipe=7, data=8, model=2): 482, ProcessCoord(pipe=7, data=8, model=3): 483, ProcessCoord(pipe=7, data=9, model=0): 484, ProcessCoord(pipe=7, data=9, model=1): 485, ProcessCoord(pipe=7, data=9, model=2): 486, ProcessCoord(pipe=7, data=9, model=3): 487, ProcessCoord(pipe=7, data=10, model=0): 488, ProcessCoord(pipe=7, data=10, model=1): 489, ProcessCoord(pipe=7, data=10, model=2): 490, ProcessCoord(pipe=7, data=10, model=3): 491, ProcessCoord(pipe=7, data=11, model=0): 492, ProcessCoord(pipe=7, data=11, model=1): 493, ProcessCoord(pipe=7, data=11, model=2): 494, ProcessCoord(pipe=7, data=11, model=3): 495, ProcessCoord(pipe=7, data=12, model=0): 496, ProcessCoord(pipe=7, data=12, model=1): 497, ProcessCoord(pipe=7, data=12, model=2): 498, ProcessCoord(pipe=7, data=12, model=3): 499, ProcessCoord(pipe=7, data=13, model=0): 500, ProcessCoord(pipe=7, data=13, model=1): 501, ProcessCoord(pipe=7, data=13, model=2): 502, ProcessCoord(pipe=7, data=13, model=3): 503, ProcessCoord(pipe=7, data=14, model=0): 504, ProcessCoord(pipe=7, data=14, model=1): 505, ProcessCoord(pipe=7, data=14, model=2): 506, ProcessCoord(pipe=7, data=14, model=3): 507, ProcessCoord(pipe=7, data=15, model=0): 508, ProcessCoord(pipe=7, data=15, model=1): 509, ProcessCoord(pipe=7, data=15, model=2): 510, ProcessCoord(pipe=7, data=15, model=3): 511}
[2021-09-27 03:54:57,678] [INFO] [module.py:360:_partition_layers] Partitioning pipeline stages with method type:transformer
stage=0 layers=7
     0: _to_float16
     1: EmbeddingPipe
     2: <lambda>
     3: ParallelTransformerLayerPipe
     4: ParallelTransformerLayerPipe
     5: ParallelTransformerLayerPipe
     6: ParallelTransformerLayerPipe
stage=1 layers=4
     7: ParallelTransformerLayerPipe
     8: ParallelTransformerLayerPipe
     9: ParallelTransformerLayerPipe
    10: ParallelTransformerLayerPipe
stage=2 layers=4
    11: ParallelTransformerLayerPipe
    12: ParallelTransformerLayerPipe
    13: ParallelTransformerLayerPipe
    14: ParallelTransformerLayerPipe
stage=3 layers=4
    15: ParallelTransformerLayerPipe
    16: ParallelTransformerLayerPipe
    17: ParallelTransformerLayerPipe
    18: ParallelTransformerLayerPipe
stage=4 layers=4
    19: ParallelTransformerLayerPipe
    20: ParallelTransformerLayerPipe
    21: ParallelTransformerLayerPipe
    22: ParallelTransformerLayerPipe
stage=5 layers=4
    23: ParallelTransformerLayerPipe
    24: ParallelTransformerLayerPipe
    25: ParallelTransformerLayerPipe
    26: ParallelTransformerLayerPipe
stage=6 layers=4
    27: ParallelTransformerLayerPipe
    28: ParallelTransformerLayerPipe
    29: ParallelTransformerLayerPipe
    30: ParallelTransformerLayerPipe
stage=7 layers=8
    31: ParallelTransformerLayerPipe
    32: ParallelTransformerLayerPipe
    33: ParallelTransformerLayerPipe
    34: ParallelTransformerLayerPipe
    35: <lambda>
    36: MixedFusedLayerNorm
    37: EmbeddingPipe
    38: float16_to_fp32
  loss: CrossEntropy
 > number of parameters on (tensor, pipeline) model parallel rank (2, 2): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (0, 2): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (3, 2): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (1, 2): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (0, 4): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (2, 4): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (1, 4): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (3, 4): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (0, 6): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (3, 6): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (1, 6): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (2, 6): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (3, 1): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (2, 1): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (1, 5): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 5): 1745293312

 > number of parameters on (tensor, pipeline) model parallel rank (2, 5): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (3, 3): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (0, 5): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (0, 3): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (2, 3): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (1, 1): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (1, 3): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (0, 1): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (2, 7): 1986498560
 > number of parameters on (tensor, pipeline) model parallel rank (2, 0): 1986465792
 > number of parameters on (tensor, pipeline) model parallel rank (1, 7): 1986498560
 > number of parameters on (tensor, pipeline) model parallel rank (0, 7): 1986498560
 > number of parameters on (tensor, pipeline) model parallel rank (1, 0): 1986465792
 > number of parameters on (tensor, pipeline) model parallel rank (3, 0): 1986465792
 > number of parameters on (tensor, pipeline) model parallel rank (3, 7): 1986498560
[2021-09-27 03:54:59,504] [INFO] [utils.py:680:see_memory_usage] After Building Model
[2021-09-27 03:54:59,505] [INFO] [utils.py:681:see_memory_usage] MA 3.77 GB         Max_MA 3.79 GB         CA 3.79 GB         Max_CA 4 GB 
[2021-09-27 03:54:59,505] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory:  used = 37.03 GB, percent = 19.8%
 > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 1986465792
setting training iterations to 159576
> learning rate decay style: cosine
DeepSpeed is enabled.
[2021-09-27 03:54:59,644] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.4.2+bc17042, git-hash=bc17042, git-branch=big-science
[2021-09-27 03:54:59,753] [INFO] [engine.py:179:__init__] DeepSpeed Flops Profiler Enabled: False
[2021-09-27 03:54:59,753] [INFO] [engine.py:736:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer
[2021-09-27 03:54:59,753] [INFO] [engine.py:741:_configure_optimizer] Using client Optimizer as basic optimizer
[2021-09-27 03:54:59,753] [INFO] [engine.py:750:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam
[2021-09-27 03:54:59,753] [INFO] [utils.py:43:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type=<class 'apex.optimizers.fused_adam.FusedAdam'>
[2021-09-27 03:54:59,753] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer
[2021-09-27 03:54:59,753] [INFO] [stage2.py:106:__init__] Reduce bucket size 500000000
[2021-09-27 03:54:59,753] [INFO] [stage2.py:107:__init__] Allgather bucket size 500000000
[2021-09-27 03:54:59,753] [INFO] [stage2.py:108:__init__] CPU Offload: False
[2021-09-27 03:54:59,753] [INFO] [stage2.py:109:__init__] Round robin gradient partitioning: False
[2021-09-27 03:55:04,471] [INFO] [stage2.py:419:__init__] optimizer state initialized
[2021-09-27 03:55:04,471] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam
[2021-09-27 03:55:04,471] [INFO] [engine.py:553:_configure_lr_scheduler] DeepSpeed using client LR scheduler
[2021-09-27 03:55:04,471] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = <megatron.learning_rates.AnnealingLR object at 0x153a57a54eb0>
[2021-09-27 03:55:04,471] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)]
[2021-09-27 03:55:04,471] [INFO] [config.py:900:print] DeepSpeedEngine configuration:
[2021-09-27 03:55:04,472] [INFO] [config.py:904:print]   activation_checkpointing_config  {
    "partition_activations": false, 
    "contiguous_memory_optimization": false, 
    "cpu_checkpointing": false, 
    "number_checkpoints": null, 
    "synchronize_checkpoint_boundary": false, 
    "profile": false
}
[2021-09-27 03:55:04,472] [INFO] [config.py:904:print]   aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}
[2021-09-27 03:55:04,472] [INFO] [config.py:904:print]   allreduce_always_fp32 ........ False
[2021-09-27 03:55:04,472] [INFO] [config.py:904:print]   amp_enabled .................. False
[2021-09-27 03:55:04,472] [INFO] [config.py:904:print]   amp_params ................... False
[2021-09-27 03:55:04,472] [INFO] [config.py:904:print]   checkpoint_tag_validation_enabled  True
[2021-09-27 03:55:04,472] [INFO] [config.py:904:print]   checkpoint_tag_validation_fail  False
[2021-09-27 03:55:04,472] [INFO] [config.py:904:print]   disable_allgather ............ False
[2021-09-27 03:55:04,472] [INFO] [config.py:904:print]   dump_state ................... False
[2021-09-27 03:55:04,472] [INFO] [config.py:904:print]   dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1}
[2021-09-27 03:55:04,472] [INFO] [config.py:904:print]   eigenvalue_enabled ........... False
[2021-09-27 03:55:04,472] [INFO] [config.py:904:print]   eigenvalue_gas_boundary_resolution  1
[2021-09-27 03:55:04,472] [INFO] [config.py:904:print]   eigenvalue_layer_name ........ bert.encoder.layer
[2021-09-27 03:55:04,472] [INFO] [config.py:904:print]   eigenvalue_layer_num ......... 0
[2021-09-27 03:55:04,472] [INFO] [config.py:904:print]   eigenvalue_max_iter .......... 100
[2021-09-27 03:55:04,472] [INFO] [config.py:904:print]   eigenvalue_stability ......... 1e-06
[2021-09-27 03:55:04,472] [INFO] [config.py:904:print]   eigenvalue_tol ............... 0.01
[2021-09-27 03:55:04,472] [INFO] [config.py:904:print]   eigenvalue_verbose ........... False
[2021-09-27 03:55:04,472] [INFO] [config.py:904:print]   elasticity_enabled ........... False
[2021-09-27 03:55:04,472] [INFO] [config.py:904:print]   flops_profiler_config ........ {
    "enabled": false, 
    "profile_step": 1, 
    "module_depth": -1, 
    "top_modules": 1, 
    "detailed": true, 
    "output_file": null
}
[2021-09-27 03:55:04,472] [INFO] [config.py:904:print]   fp16_enabled ................. True
[2021-09-27 03:55:04,472] [INFO] [config.py:904:print]   fp16_mixed_quantize .......... False
[2021-09-27 03:55:04,472] [INFO] [config.py:904:print]   global_rank .................. 0
[2021-09-27 03:55:04,472] [INFO] [config.py:904:print]   gradient_accumulation_steps .. 128
[2021-09-27 03:55:04,472] [INFO] [config.py:904:print]   gradient_clipping ............ 1.0
[2021-09-27 03:55:04,472] [INFO] [config.py:904:print]   gradient_predivide_factor .... 1.0
[2021-09-27 03:55:04,472] [INFO] [config.py:904:print]   initial_dynamic_scale ........ 4096
[2021-09-27 03:55:04,472] [INFO] [config.py:904:print]   loss_scale ................... 0
[2021-09-27 03:55:04,472] [INFO] [config.py:904:print]   memory_breakdown ............. False
[2021-09-27 03:55:04,472] [INFO] [config.py:904:print]   optimizer_legacy_fusion ...... False
[2021-09-27 03:55:04,472] [INFO] [config.py:904:print]   optimizer_name ............... None
[2021-09-27 03:55:04,472] [INFO] [config.py:904:print]   optimizer_params ............. None
[2021-09-27 03:55:04,473] [INFO] [config.py:904:print]   pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0}
[2021-09-27 03:55:04,473] [INFO] [config.py:904:print]   pld_enabled .................. False
[2021-09-27 03:55:04,473] [INFO] [config.py:904:print]   pld_params ................... False
[2021-09-27 03:55:04,473] [INFO] [config.py:904:print]   prescale_gradients ........... False
[2021-09-27 03:55:04,473] [INFO] [config.py:904:print]   quantize_change_rate ......... 0.001
[2021-09-27 03:55:04,473] [INFO] [config.py:904:print]   quantize_groups .............. 1
[2021-09-27 03:55:04,473] [INFO] [config.py:904:print]   quantize_offset .............. 1000
[2021-09-27 03:55:04,473] [INFO] [config.py:904:print]   quantize_period .............. 1000
[2021-09-27 03:55:04,473] [INFO] [config.py:904:print]   quantize_rounding ............ 0
[2021-09-27 03:55:04,473] [INFO] [config.py:904:print]   quantize_start_bits .......... 16
[2021-09-27 03:55:04,473] [INFO] [config.py:904:print]   quantize_target_bits ......... 8
[2021-09-27 03:55:04,473] [INFO] [config.py:904:print]   quantize_training_enabled .... False
[2021-09-27 03:55:04,473] [INFO] [config.py:904:print]   quantize_type ................ 0
[2021-09-27 03:55:04,473] [INFO] [config.py:904:print]   quantize_verbose ............. False
[2021-09-27 03:55:04,473] [INFO] [config.py:904:print]   scheduler_name ............... None
[2021-09-27 03:55:04,473] [INFO] [config.py:904:print]   scheduler_params ............. None
[2021-09-27 03:55:04,473] [INFO] [config.py:904:print]   sparse_attention ............. None
[2021-09-27 03:55:04,473] [INFO] [config.py:904:print]   sparse_gradients_enabled ..... False
[2021-09-27 03:55:04,473] [INFO] [config.py:904:print]   steps_per_print .............. 2000
[2021-09-27 03:55:04,473] [INFO] [config.py:904:print]   tensorboard_enabled .......... False
[2021-09-27 03:55:04,473] [INFO] [config.py:904:print]   tensorboard_job_name ......... DeepSpeedJobName
[2021-09-27 03:55:04,473] [INFO] [config.py:904:print]   tensorboard_output_path ...... 
[2021-09-27 03:55:04,473] [INFO] [config.py:904:print]   train_batch_size ............. 2048
[2021-09-27 03:55:04,473] [INFO] [config.py:904:print]   train_micro_batch_size_per_gpu  1
[2021-09-27 03:55:04,473] [INFO] [config.py:904:print]   use_quantizer_kernel ......... False
[2021-09-27 03:55:04,473] [INFO] [config.py:904:print]   wall_clock_breakdown ......... False
[2021-09-27 03:55:04,473] [INFO] [config.py:904:print]   world_size ................... 16
[2021-09-27 03:55:04,473] [INFO] [config.py:904:print]   zero_allow_untested_optimizer  False
[2021-09-27 03:55:04,473] [INFO] [config.py:904:print]   zero_config .................. {
    "stage": 1, 
    "contiguous_gradients": false, 
    "reduce_scatter": true, 
    "reduce_bucket_size": 5.000000e+08, 
    "allgather_partitions": true, 
    "allgather_bucket_size": 5.000000e+08, 
    "overlap_comm": false, 
    "load_from_fp32_weights": true, 
    "elastic_checkpoint": true, 
    "offload_param": null, 
    "offload_optimizer": null, 
    "sub_group_size": 1.000000e+09, 
    "prefetch_bucket_size": 5.000000e+07, 
    "param_persistence_threshold": 1.000000e+05, 
    "max_live_parameters": 1.000000e+09, 
    "max_reuse_distance": 1.000000e+09, 
    "gather_fp16_weights_on_model_save": false, 
    "ignore_unused_parameters": true, 
    "round_robin_gradients": false, 
    "legacy_stage1": false
}
[2021-09-27 03:55:04,473] [INFO] [config.py:904:print]   zero_enabled ................. True
[2021-09-27 03:55:04,473] [INFO] [config.py:904:print]   zero_optimization_stage ...... 1
[2021-09-27 03:55:04,473] [INFO] [config.py:906:print]   json = {
    "train_micro_batch_size_per_gpu": 1, 
    "train_batch_size": 2.048000e+03, 
    "gradient_clipping": 1.0, 
    "zero_optimization": {
        "stage": 1
    }, 
    "fp16": {
        "enabled": true, 
        "loss_scale": 0, 
        "loss_scale_window": 500, 
        "hysteresis": 2, 
        "min_loss_scale": 1, 
        "initial_scale_power": 12
    }, 
    "steps_per_print": 2.000000e+03, 
    "wall_clock_breakdown": false
}
[2021-09-27 03:55:04,474] [INFO] [engine.py:76:__init__] CONFIG: micro_batches=128 micro_batch_size=1
[2021-09-27 03:55:04,910] [INFO] [engine.py:134:__init__] RANK=0 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-27 03:55:04,910] [INFO] [engine.py:134:__init__] RANK=2 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-27 03:55:04,910] [INFO] [engine.py:134:__init__] RANK=1 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=3 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=259 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=256 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=258 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=257 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=130 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=129 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=131 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=128 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=384 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=385 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=386 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=387 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=194 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=195 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=193 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=192 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=449 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=448 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=451 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=321 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=320 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=322 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=323 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=66 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=67 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=64 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=450 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=65 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
 > using checkpoint value 6e-05 for learning rate
 > using checkpoint value 6e-06 for minimum learning rate
 > using checkpoint value 216320 for warmup iterations
 > using checkpoint value 126953125 for total number of iterations
 > using checkpoint value cosine for decay style
successfully loaded 8 ZeRO state_dicts for rank 384
successfully loaded 8 ZeRO state_dicts for rank 424
successfully loaded 8 ZeRO state_dicts for rank 444
successfully loaded 8 ZeRO state_dicts for rank 400
successfully loaded 8 ZeRO state_dicts for rank 261
successfully loaded 8 ZeRO state_dicts for rank 432
successfully loaded 8 ZeRO state_dicts for rank 420
successfully loaded 8 ZeRO state_dicts for rank 152
successfully loaded 8 ZeRO state_dicts for rank 440
successfully loaded 8 ZeRO state_dicts for rank 387
successfully loaded 8 ZeRO state_dicts for rank 296
successfully loaded 8 ZeRO state_dicts for rank 392
successfully loaded 8 ZeRO state_dicts for rank 196
successfully loaded 8 ZeRO state_dicts for rank 338
successfully loaded 8 ZeRO state_dicts for rank 379
successfully loaded 8 ZeRO state_dicts for rank 336
loading 8 zero partition checkpoints for rank 384
successfully loaded 8 ZeRO state_dicts for rank 385
successfully loaded 8 ZeRO state_dicts for rank 445
successfully loaded 8 ZeRO state_dicts for rank 84
successfully loaded 8 ZeRO state_dicts for rank 86
successfully loaded 8 ZeRO state_dicts for rank 428
successfully loaded 8 ZeRO state_dicts for rank 337
successfully loaded 8 ZeRO state_dicts for rank 416
successfully loaded 8 ZeRO state_dicts for rank 436
loading 8 zero partition checkpoints for rank 424
successfully loaded 8 ZeRO state_dicts for rank 88
loading 8 zero partition checkpoints for rank 444
successfully loaded 8 ZeRO state_dicts for rank 376
successfully loaded 8 ZeRO state_dicts for rank 125
successfully loaded 8 ZeRO state_dicts for rank 197
successfully loaded 8 ZeRO state_dicts for rank 198
successfully loaded 8 ZeRO state_dicts for rank 388
successfully loaded 8 ZeRO state_dicts for rank 238
successfully loaded 8 ZeRO state_dicts for rank 248
successfully loaded 8 ZeRO state_dicts for rank 174
loading 8 zero partition checkpoints for rank 261
successfully loaded 8 ZeRO state_dicts for rank 250
loading 8 zero partition checkpoints for rank 400
successfully loaded 8 ZeRO state_dicts for rank 192
successfully loaded 8 ZeRO state_dicts for rank 277
successfully loaded 8 ZeRO state_dicts for rank 437
successfully loaded 8 ZeRO state_dicts for rank 204
successfully loaded 8 ZeRO state_dicts for rank 297
loading 8 zero partition checkpoints for rank 432
successfully loaded 8 ZeRO state_dicts for rank 158
successfully loaded 8 ZeRO state_dicts for rank 99
successfully loaded 8 ZeRO state_dicts for rank 194
successfully loaded 8 ZeRO state_dicts for rank 199
successfully loaded 8 ZeRO state_dicts for rank 382
successfully loaded 8 ZeRO state_dicts for rank 332
successfully loaded 8 ZeRO state_dicts for rank 245
successfully loaded 8 ZeRO state_dicts for rank 441
successfully loaded 8 ZeRO state_dicts for rank 299
successfully loaded 8 ZeRO state_dicts for rank 242
successfully loaded 8 ZeRO state_dicts for rank 391
loading 8 zero partition checkpoints for rank 420
successfully loaded 8 ZeRO state_dicts for rank 234
successfully loaded 8 ZeRO state_dicts for rank 380
successfully loaded 8 ZeRO state_dicts for rank 433
successfully loaded 8 ZeRO state_dicts for rank 423
successfully loaded 8 ZeRO state_dicts for rank 425
loading 8 zero partition checkpoints for rank 440
loading 8 zero partition checkpoints for rank 152
successfully loaded 8 ZeRO state_dicts for rank 232
successfully loaded 8 ZeRO state_dicts for rank 246
successfully loaded 8 ZeRO state_dicts for rank 401
successfully loaded 8 ZeRO state_dicts for rank 89
successfully loaded 8 ZeRO state_dicts for rank 241
successfully loaded 8 ZeRO state_dicts for rank 155
successfully loaded 8 ZeRO state_dicts for rank 394
successfully loaded 8 ZeRO state_dicts for rank 178
successfully loaded 8 ZeRO state_dicts for rank 257
successfully loaded 8 ZeRO state_dicts for rank 429
successfully loaded 8 ZeRO state_dicts for rank 422
successfully loaded 8 ZeRO state_dicts for rank 265
successfully loaded 8 ZeRO state_dicts for rank 340
successfully loaded 8 ZeRO state_dicts for rank 256
successfully loaded 8 ZeRO state_dicts for rank 229
successfully loaded 8 ZeRO state_dicts for rank 218
loading 8 zero partition checkpoints for rank 387
successfully loaded 8 ZeRO state_dicts for rank 421
successfully loaded 8 ZeRO state_dicts for rank 121
successfully loaded 8 ZeRO state_dicts for rank 153
successfully loaded 8 ZeRO state_dicts for rank 182
successfully loaded 8 ZeRO state_dicts for rank 447
successfully loaded 8 ZeRO state_dicts for rank 216
successfully loaded 8 ZeRO state_dicts for rank 237
successfully loaded 8 ZeRO state_dicts for rank 403
successfully loaded 8 ZeRO state_dicts for rank 378
successfully loaded 8 ZeRO state_dicts for rank 341
successfully loaded 8 ZeRO state_dicts for rank 389
successfully loaded 8 ZeRO state_dicts for rank 367
successfully loaded 8 ZeRO state_dicts for rank 236
successfully loaded 8 ZeRO state_dicts for rank 292
successfully loaded 8 ZeRO state_dicts for rank 298
successfully loaded 8 ZeRO state_dicts for rank 393
successfully loaded 8 ZeRO state_dicts for rank 126
successfully loaded 8 ZeRO state_dicts for rank 180
successfully loaded 8 ZeRO state_dicts for rank 383
successfully loaded 8 ZeRO state_dicts for rank 446
successfully loaded 8 ZeRO state_dicts for rank 366
successfully loaded 8 ZeRO state_dicts for rank 443
loading 8 zero partition checkpoints for rank 392
successfully loaded 8 ZeRO state_dicts for rank 278
successfully loaded 8 ZeRO state_dicts for rank 96
successfully loaded 8 ZeRO state_dicts for rank 69
successfully loaded 8 ZeRO state_dicts for rank 136
successfully loaded 8 ZeRO state_dicts for rank 386
successfully loaded 8 ZeRO state_dicts for rank 408
loading 8 zero partition checkpoints for rank 338
successfully loaded 8 ZeRO state_dicts for rank 109
loading 8 zero partition checkpoints for rank 196
successfully loaded 8 ZeRO state_dicts for rank 154
successfully loaded 8 ZeRO state_dicts for rank 430
successfully loaded 8 ZeRO state_dicts for rank 342
successfully loaded 8 ZeRO state_dicts for rank 206
successfully loaded 8 ZeRO state_dicts for rank 128
successfully loaded 8 ZeRO state_dicts for rank 123
successfully loaded 8 ZeRO state_dicts for rank 339
successfully loaded 8 ZeRO state_dicts for rank 233
successfully loaded 8 ZeRO state_dicts for rank 235
successfully loaded 8 ZeRO state_dicts for rank 279
successfully loaded 8 ZeRO state_dicts for rank 285
successfully loaded 8 ZeRO state_dicts for rank 219
successfully loaded 8 ZeRO state_dicts for rank 124
successfully loaded 8 ZeRO state_dicts for rank 90
successfully loaded 8 ZeRO state_dicts for rank 249
successfully loaded 8 ZeRO state_dicts for rank 343
successfully loaded 8 ZeRO state_dicts for rank 132
successfully loaded 8 ZeRO state_dicts for rank 150
successfully loaded 8 ZeRO state_dicts for rank 450
successfully loaded 8 ZeRO state_dicts for rank 313
successfully loaded 8 ZeRO state_dicts for rank 293
successfully loaded 8 ZeRO state_dicts for rank 381
successfully loaded 8 ZeRO state_dicts for rank 364
successfully loaded 8 ZeRO state_dicts for rank 251
successfully loaded 8 ZeRO state_dicts for rank 65
loading 8 zero partition checkpoints for rank 379
successfully loaded 8 ZeRO state_dicts for rank 266
successfully loaded 8 ZeRO state_dicts for rank 365
loading 8 zero partition checkpoints for rank 296
successfully loaded 8 ZeRO state_dicts for rank 442
successfully loaded 8 ZeRO state_dicts for rank 243
successfully loaded 8 ZeRO state_dicts for rank 431
successfully loaded 8 ZeRO state_dicts for rank 276
successfully loaded 8 ZeRO state_dicts for rank 175
successfully loaded 8 ZeRO state_dicts for rank 435
successfully loaded 8 ZeRO state_dicts for rank 309
loading 8 zero partition checkpoints for rank 336
successfully loaded 8 ZeRO state_dicts for rank 335
successfully loaded 8 ZeRO state_dicts for rank 172
successfully loaded 8 ZeRO state_dicts for rank 412
successfully loaded 8 ZeRO state_dicts for rank 217
successfully loaded 8 ZeRO state_dicts for rank 438
successfully loaded 8 ZeRO state_dicts for rank 426
successfully loaded 8 ZeRO state_dicts for rank 317
successfully loaded 8 ZeRO state_dicts for rank 176
successfully loaded 8 ZeRO state_dicts for rank 260
successfully loaded 8 ZeRO state_dicts for rank 240
successfully loaded 8 ZeRO state_dicts for rank 143
successfully loaded 8 ZeRO state_dicts for rank 120
successfully loaded 8 ZeRO state_dicts for rank 354
successfully loaded 8 ZeRO state_dicts for rank 239
successfully loaded 8 ZeRO state_dicts for rank 228
successfully loaded 8 ZeRO state_dicts for rank 193
successfully loaded 8 ZeRO state_dicts for rank 289
successfully loaded 8 ZeRO state_dicts for rank 70
successfully loaded 8 ZeRO state_dicts for rank 247
successfully loaded 8 ZeRO state_dicts for rank 63
successfully loaded 8 ZeRO state_dicts for rank 139
successfully loaded 8 ZeRO state_dicts for rank 439
loading 8 zero partition checkpoints for rank 84
loading 8 zero partition checkpoints for rank 385
successfully loaded 8 ZeRO state_dicts for rank 173
successfully loaded 8 ZeRO state_dicts for rank 396
successfully loaded 8 ZeRO state_dicts for rank 355
successfully loaded 8 ZeRO state_dicts for rank 141
successfully loaded 8 ZeRO state_dicts for rank 8
successfully loaded 8 ZeRO state_dicts for rank 164
successfully loaded 8 ZeRO state_dicts for rank 148
successfully loaded 8 ZeRO state_dicts for rank 177
successfully loaded 8 ZeRO state_dicts for rank 312
successfully loaded 8 ZeRO state_dicts for rank 244
successfully loaded 8 ZeRO state_dicts for rank 252
successfully loaded 8 ZeRO state_dicts for rank 369
successfully loaded 8 ZeRO state_dicts for rank 149
successfully loaded 8 ZeRO state_dicts for rank 351
successfully loaded 8 ZeRO state_dicts for rank 167
loading 8 zero partition checkpoints for rank 337
successfully loaded 8 ZeRO state_dicts for rank 79
successfully loaded 8 ZeRO state_dicts for rank 334
successfully loaded 8 ZeRO state_dicts for rank 390
successfully loaded 8 ZeRO state_dicts for rank 427
successfully loaded 8 ZeRO state_dicts for rank 122
successfully loaded 8 ZeRO state_dicts for rank 156
successfully loaded 8 ZeRO state_dicts for rank 64
successfully loaded 8 ZeRO state_dicts for rank 97
loading 8 zero partition checkpoints for rank 436
successfully loaded 8 ZeRO state_dicts for rank 263
successfully loaded 8 ZeRO state_dicts for rank 142
successfully loaded 8 ZeRO state_dicts for rank 68
successfully loaded 8 ZeRO state_dicts for rank 157
successfully loaded 8 ZeRO state_dicts for rank 377
successfully loaded 8 ZeRO state_dicts for rank 352
loading 8 zero partition checkpoints for rank 376
successfully loaded 8 ZeRO state_dicts for rank 195
successfully loaded 8 ZeRO state_dicts for rank 231
successfully loaded 8 ZeRO state_dicts for rank 291
successfully loaded 8 ZeRO state_dicts for rank 77
loading 8 zero partition checkpoints for rank 445
loading 8 zero partition checkpoints for rank 428
successfully loaded 8 ZeRO state_dicts for rank 290
loading 8 zero partition checkpoints for rank 416
successfully loaded 8 ZeRO state_dicts for rank 127
successfully loaded 8 ZeRO state_dicts for rank 137
successfully loaded 8 ZeRO state_dicts for rank 61
successfully loaded 8 ZeRO state_dicts for rank 105
successfully loaded 8 ZeRO state_dicts for rank 62
successfully loaded 8 ZeRO state_dicts for rank 414
successfully loaded 8 ZeRO state_dicts for rank 212
successfully loaded 8 ZeRO state_dicts for rank 262
successfully loaded 8 ZeRO state_dicts for rank 468
successfully loaded 8 ZeRO state_dicts for rank 395
loading 8 zero partition checkpoints for rank 198
loading 8 zero partition checkpoints for rank 388
successfully loaded 8 ZeRO state_dicts for rank 87
successfully loaded 8 ZeRO state_dicts for rank 253
loading 8 zero partition checkpoints for rank 88
successfully loaded 8 ZeRO state_dicts for rank 189
successfully loaded 8 ZeRO state_dicts for rank 205
successfully loaded 8 ZeRO state_dicts for rank 166
successfully loaded 8 ZeRO state_dicts for rank 404
successfully loaded 8 ZeRO state_dicts for rank 417
successfully loaded 8 ZeRO state_dicts for rank 130
successfully loaded 8 ZeRO state_dicts for rank 288
successfully loaded 8 ZeRO state_dicts for rank 159
successfully loaded 8 ZeRO state_dicts for rank 179
successfully loaded 8 ZeRO state_dicts for rank 60
successfully loaded 8 ZeRO state_dicts for rank 402
successfully loaded 8 ZeRO state_dicts for rank 349
successfully loaded 8 ZeRO state_dicts for rank 188
successfully loaded 8 ZeRO state_dicts for rank 410
successfully loaded 8 ZeRO state_dicts for rank 220
successfully loaded 8 ZeRO state_dicts for rank 101
successfully loaded 8 ZeRO state_dicts for rank 398
successfully loaded 8 ZeRO state_dicts for rank 281
successfully loaded 8 ZeRO state_dicts for rank 254
successfully loaded 8 ZeRO state_dicts for rank 474
successfully loaded 8 ZeRO state_dicts for rank 333
successfully loaded 8 ZeRO state_dicts for rank 358
successfully loaded 8 ZeRO state_dicts for rank 363
successfully loaded 8 ZeRO state_dicts for rank 184
successfully loaded 8 ZeRO state_dicts for rank 82
successfully loaded 8 ZeRO state_dicts for rank 80
successfully loaded 8 ZeRO state_dicts for rank 471
successfully loaded 8 ZeRO state_dicts for rank 453
successfully loaded 8 ZeRO state_dicts for rank 345
successfully loaded 8 ZeRO state_dicts for rank 81
successfully loaded 8 ZeRO state_dicts for rank 76
successfully loaded 8 ZeRO state_dicts for rank 85
successfully loaded 8 ZeRO state_dicts for rank 434
successfully loaded 8 ZeRO state_dicts for rank 267
successfully loaded 8 ZeRO state_dicts for rank 230
loading 8 zero partition checkpoints for rank 197
successfully loaded 8 ZeRO state_dicts for rank 295
successfully loaded 8 ZeRO state_dicts for rank 353
loading 8 zero partition checkpoints for rank 437
successfully loaded 8 ZeRO state_dicts for rank 273
successfully loaded 8 ZeRO state_dicts for rank 202
successfully loaded 8 ZeRO state_dicts for rank 36
successfully loaded 8 ZeRO state_dicts for rank 470
successfully loaded 8 ZeRO state_dicts for rank 357
successfully loaded 8 ZeRO state_dicts for rank 151
successfully loaded 8 ZeRO state_dicts for rank 301
successfully loaded 8 ZeRO state_dicts for rank 315
loading 8 zero partition checkpoints for rank 174
successfully loaded 8 ZeRO state_dicts for rank 209
successfully loaded 8 ZeRO state_dicts for rank 113
successfully loaded 8 ZeRO state_dicts for rank 160
loading 8 zero partition checkpoints for rank 125
successfully loaded 8 ZeRO state_dicts for rank 201
successfully loaded 8 ZeRO state_dicts for rank 104
loading 8 zero partition checkpoints for rank 248
successfully loaded 8 ZeRO state_dicts for rank 370
successfully loaded 8 ZeRO state_dicts for rank 311
successfully loaded 8 ZeRO state_dicts for rank 11
successfully loaded 8 ZeRO state_dicts for rank 478
successfully loaded 8 ZeRO state_dicts for rank 227
successfully loaded 8 ZeRO state_dicts for rank 183
successfully loaded 8 ZeRO state_dicts for rank 272
successfully loaded 8 ZeRO state_dicts for rank 255
loading 8 zero partition checkpoints for rank 194
successfully loaded 8 ZeRO state_dicts for rank 9
loading 8 zero partition checkpoints for rank 204
successfully loaded 8 ZeRO state_dicts for rank 93
successfully loaded 8 ZeRO state_dicts for rank 399
successfully loaded 8 ZeRO state_dicts for rank 451
successfully loaded 8 ZeRO state_dicts for rank 168
successfully loaded 8 ZeRO state_dicts for rank 200
successfully loaded 8 ZeRO state_dicts for rank 316
loading 8 zero partition checkpoints for rank 158
successfully loaded 8 ZeRO state_dicts for rank 91
successfully loaded 8 ZeRO state_dicts for rank 73
loading 8 zero partition checkpoints for rank 441
successfully loaded 8 ZeRO state_dicts for rank 418
successfully loaded 8 ZeRO state_dicts for rank 448
successfully loaded 8 ZeRO state_dicts for rank 187
successfully loaded 8 ZeRO state_dicts for rank 356
successfully loaded 8 ZeRO state_dicts for rank 269
loading 8 zero partition checkpoints for rank 299
successfully loaded 8 ZeRO state_dicts for rank 131
successfully loaded 8 ZeRO state_dicts for rank 361
loading 8 zero partition checkpoints for rank 277
successfully loaded 8 ZeRO state_dicts for rank 39
successfully loaded 8 ZeRO state_dicts for rank 350
loading 8 zero partition checkpoints for rank 391
loading 8 zero partition checkpoints for rank 297
successfully loaded 8 ZeRO state_dicts for rank 107
loading 8 zero partition checkpoints for rank 234
loading 8 zero partition checkpoints for rank 242
successfully loaded 8 ZeRO state_dicts for rank 318
successfully loaded 8 ZeRO state_dicts for rank 373
successfully loaded 8 ZeRO state_dicts for rank 475
successfully loaded 8 ZeRO state_dicts for rank 103
successfully loaded 8 ZeRO state_dicts for rank 472
successfully loaded 8 ZeRO state_dicts for rank 221
successfully loaded 8 ZeRO state_dicts for rank 210
loading 8 zero partition checkpoints for rank 192
successfully loaded 8 ZeRO state_dicts for rank 368
successfully loaded 8 ZeRO state_dicts for rank 140
successfully loaded 8 ZeRO state_dicts for rank 268
successfully loaded 8 ZeRO state_dicts for rank 456
successfully loaded 8 ZeRO state_dicts for rank 455
successfully loaded 8 ZeRO state_dicts for rank 321
successfully loaded 8 ZeRO state_dicts for rank 462
successfully loaded 8 ZeRO state_dicts for rank 284
successfully loaded 8 ZeRO state_dicts for rank 117
successfully loaded 8 ZeRO state_dicts for rank 41
loading 8 zero partition checkpoints for rank 394
successfully loaded 8 ZeRO state_dicts for rank 359
successfully loaded 8 ZeRO state_dicts for rank 375
successfully loaded 8 ZeRO state_dicts for rank 215
successfully loaded 8 ZeRO state_dicts for rank 181
loading 8 zero partition checkpoints for rank 423
successfully loaded 8 ZeRO state_dicts for rank 10
successfully loaded 8 ZeRO state_dicts for rank 100
successfully loaded 8 ZeRO state_dicts for rank 191
loading 8 zero partition checkpoints for rank 178
successfully loaded 8 ZeRO state_dicts for rank 294
loading 8 zero partition checkpoints for rank 332
successfully loaded 8 ZeRO state_dicts for rank 207
successfully loaded 8 ZeRO state_dicts for rank 371
loading 8 zero partition checkpoints for rank 401
successfully loaded 8 ZeRO state_dicts for rank 203
successfully loaded 8 ZeRO state_dicts for rank 37
successfully loaded 8 ZeRO state_dicts for rank 324
loading 8 zero partition checkpoints for rank 241
loading 8 zero partition checkpoints for rank 422
loading 8 zero partition checkpoints for rank 199
successfully loaded 8 ZeRO state_dicts for rank 35
successfully loaded 8 ZeRO state_dicts for rank 322
successfully loaded 8 ZeRO state_dicts for rank 258
successfully loaded 8 ZeRO state_dicts for rank 329
successfully loaded 8 ZeRO state_dicts for rank 222
successfully loaded 8 ZeRO state_dicts for rank 460
loading 8 zero partition checkpoints for rank 380
loading 8 zero partition checkpoints for rank 421
successfully loaded 8 ZeRO state_dicts for rank 323
loading 8 zero partition checkpoints for rank 256
loading 8 zero partition checkpoints for rank 433
loading 8 zero partition checkpoints for rank 229
successfully loaded 8 ZeRO state_dicts for rank 302
loading 8 zero partition checkpoints for rank 265
successfully loaded 8 ZeRO state_dicts for rank 74
successfully loaded 8 ZeRO state_dicts for rank 144
successfully loaded 8 ZeRO state_dicts for rank 223
successfully loaded 8 ZeRO state_dicts for rank 225
loading 8 zero partition checkpoints for rank 153
successfully loaded 8 ZeRO state_dicts for rank 72
successfully loaded 8 ZeRO state_dicts for rank 138
successfully loaded 8 ZeRO state_dicts for rank 190
loading 8 zero partition checkpoints for rank 246
successfully loaded 8 ZeRO state_dicts for rank 118
successfully loaded 8 ZeRO state_dicts for rank 406
successfully loaded 8 ZeRO state_dicts for rank 413
successfully loaded 8 ZeRO state_dicts for rank 397
successfully loaded 8 ZeRO state_dicts for rank 264
loading 8 zero partition checkpoints for rank 429
successfully loaded 8 ZeRO state_dicts for rank 275
loading 8 zero partition checkpoints for rank 237
loading 8 zero partition checkpoints for rank 403
loading 8 zero partition checkpoints for rank 378
loading 8 zero partition checkpoints for rank 232
successfully loaded 8 ZeRO state_dicts for rank 71
loading 8 zero partition checkpoints for rank 257
loading 8 zero partition checkpoints for rank 389
successfully loaded 8 ZeRO state_dicts for rank 115
successfully loaded 8 ZeRO state_dicts for rank 111
successfully loaded 8 ZeRO state_dicts for rank 108
successfully loaded 8 ZeRO state_dicts for rank 66
successfully loaded 8 ZeRO state_dicts for rank 213
successfully loaded 8 ZeRO state_dicts for rank 186
successfully loaded 8 ZeRO state_dicts for rank 43
successfully loaded 8 ZeRO state_dicts for rank 304
successfully loaded 8 ZeRO state_dicts for rank 211
loading 8 zero partition checkpoints for rank 393
successfully loaded 8 ZeRO state_dicts for rank 347
loading 8 zero partition checkpoints for rank 443
loading 8 zero partition checkpoints for rank 386
successfully loaded 8 ZeRO state_dicts for rank 314
successfully loaded 8 ZeRO state_dicts for rank 208
successfully loaded 8 ZeRO state_dicts for rank 459
successfully loaded 8 ZeRO state_dicts for rank 165
successfully loaded 8 ZeRO state_dicts for rank 419
loading 8 zero partition checkpoints for rank 278
successfully loaded 8 ZeRO state_dicts for rank 83
successfully loaded 8 ZeRO state_dicts for rank 362
loading 8 zero partition checkpoints for rank 367
loading 8 zero partition checkpoints for rank 180
successfully loaded 8 ZeRO state_dicts for rank 163
successfully loaded 8 ZeRO state_dicts for rank 214
successfully loaded 8 ZeRO state_dicts for rank 116
successfully loaded 8 ZeRO state_dicts for rank 303
successfully loaded 8 ZeRO state_dicts for rank 374
loading 8 zero partition checkpoints for rank 126
loading 8 zero partition checkpoints for rank 339
successfully loaded 8 ZeRO state_dicts for rank 274
loading 8 zero partition checkpoints for rank 292
loading 8 zero partition checkpoints for rank 128
successfully loaded 8 ZeRO state_dicts for rank 114
loading 8 zero partition checkpoints for rank 206
successfully loaded 8 ZeRO state_dicts for rank 372
successfully loaded 8 ZeRO state_dicts for rank 449
successfully loaded 8 ZeRO state_dicts for rank 40
successfully loaded 8 ZeRO state_dicts for rank 409
loading 8 zero partition checkpoints for rank 69
loading 8 zero partition checkpoints for rank 298
loading 8 zero partition checkpoints for rank 123
successfully loaded 8 ZeRO state_dicts for rank 3
loading 8 zero partition checkpoints for rank 366
loading 8 zero partition checkpoints for rank 279
successfully loaded 8 ZeRO state_dicts for rank 47
successfully loaded 8 ZeRO state_dicts for rank 259
successfully loaded 8 ZeRO state_dicts for rank 479
loading 8 zero partition checkpoints for rank 235
successfully loaded 8 ZeRO state_dicts for rank 67
loading 8 zero partition checkpoints for rank 447
successfully loaded 8 ZeRO state_dicts for rank 95
successfully loaded 8 ZeRO state_dicts for rank 270
loading 8 zero partition checkpoints for rank 96
successfully loaded 8 ZeRO state_dicts for rank 129
successfully loaded 8 ZeRO state_dicts for rank 75
successfully loaded 8 ZeRO state_dicts for rank 466
successfully loaded 8 ZeRO state_dicts for rank 226
loading 8 zero partition checkpoints for rank 216
successfully loaded 8 ZeRO state_dicts for rank 224
successfully loaded 8 ZeRO state_dicts for rank 280
loading 8 zero partition checkpoints for rank 285
loading 8 zero partition checkpoints for rank 341
successfully loaded 8 ZeRO state_dicts for rank 92
loading 8 zero partition checkpoints for rank 251
successfully loaded 8 ZeRO state_dicts for rank 29
successfully loaded 8 ZeRO state_dicts for rank 411
successfully loaded 8 ZeRO state_dicts for rank 507
loading 8 zero partition checkpoints for rank 408
successfully loaded 8 ZeRO state_dicts for rank 171
loading 8 zero partition checkpoints for rank 446
successfully loaded 8 ZeRO state_dicts for rank 146
loading 8 zero partition checkpoints for rank 340
loading 8 zero partition checkpoints for rank 245
loading 8 zero partition checkpoints for rank 430
successfully loaded 8 ZeRO state_dicts for rank 327
successfully loaded 8 ZeRO state_dicts for rank 331
loading 8 zero partition checkpoints for rank 381
loading 8 zero partition checkpoints for rank 364
successfully loaded 8 ZeRO state_dicts for rank 20
loading 8 zero partition checkpoints for rank 132
successfully loaded 8 ZeRO state_dicts for rank 282
loading 8 zero partition checkpoints for rank 233
loading 8 zero partition checkpoints for rank 243
successfully loaded 8 ZeRO state_dicts for rank 452
loading 8 zero partition checkpoints for rank 431
successfully loaded 8 ZeRO state_dicts for rank 305
successfully loaded 8 ZeRO state_dicts for rank 21
successfully loaded 8 ZeRO state_dicts for rank 169
loading 8 zero partition checkpoints for rank 86
loading 8 zero partition checkpoints for rank 442
successfully loaded 8 ZeRO state_dicts for rank 330
loading 8 zero partition checkpoints for rank 124
successfully loaded 8 ZeRO state_dicts for rank 286
loading 8 zero partition checkpoints for rank 175
successfully loaded 8 ZeRO state_dicts for rank 326
successfully loaded 8 ZeRO state_dicts for rank 454
loading 8 zero partition checkpoints for rank 155
successfully loaded 8 ZeRO state_dicts for rank 476
successfully loaded 8 ZeRO state_dicts for rank 102
successfully loaded 8 ZeRO state_dicts for rank 300
loading 8 zero partition checkpoints for rank 250
successfully loaded 8 ZeRO state_dicts for rank 1
loading 8 zero partition checkpoints for rank 435
successfully loaded 8 ZeRO state_dicts for rank 15
successfully loaded 8 ZeRO state_dicts for rank 38
successfully loaded 8 ZeRO state_dicts for rank 328
successfully loaded 8 ZeRO state_dicts for rank 0
loading 8 zero partition checkpoints for rank 172
successfully loaded 8 ZeRO state_dicts for rank 463
loading 8 zero partition checkpoints for rank 219
successfully loaded 8 ZeRO state_dicts for rank 320
loading 8 zero partition checkpoints for rank 218
successfully loaded 8 ZeRO state_dicts for rank 56
successfully loaded 8 ZeRO state_dicts for rank 271
loading 8 zero partition checkpoints for rank 150
successfully loaded 8 ZeRO state_dicts for rank 287
loading 8 zero partition checkpoints for rank 309
successfully loaded 8 ZeRO state_dicts for rank 19
successfully loaded 8 ZeRO state_dicts for rank 24
successfully loaded 8 ZeRO state_dicts for rank 27
successfully loaded 8 ZeRO state_dicts for rank 112
successfully loaded 8 ZeRO state_dicts for rank 415
successfully loaded 8 ZeRO state_dicts for rank 310
loading 8 zero partition checkpoints for rank 365
loading 8 zero partition checkpoints for rank 240
successfully loaded 8 ZeRO state_dicts for rank 78
loading 8 zero partition checkpoints for rank 260
loading 8 zero partition checkpoints for rank 342
loading 8 zero partition checkpoints for rank 313
loading 8 zero partition checkpoints for rank 438
successfully loaded 8 ZeRO state_dicts for rank 161
successfully loaded 8 ZeRO state_dicts for rank 308
successfully loaded 8 ZeRO state_dicts for rank 32
successfully loaded 8 ZeRO state_dicts for rank 344
loading 8 zero partition checkpoints for rank 289
successfully loaded 8 ZeRO state_dicts for rank 185
successfully loaded 8 ZeRO state_dicts for rank 49
successfully loaded 8 ZeRO state_dicts for rank 57
successfully loaded 8 ZeRO state_dicts for rank 484
successfully loaded 8 ZeRO state_dicts for rank 487
loading 8 zero partition checkpoints for rank 252
successfully loaded 8 ZeRO state_dicts for rank 348
loading 8 zero partition checkpoints for rank 239
successfully loaded 8 ZeRO state_dicts for rank 25
loading 8 zero partition checkpoints for rank 120
loading 8 zero partition checkpoints for rank 276
loading 8 zero partition checkpoints for rank 425
loading 8 zero partition checkpoints for rank 382
loading 8 zero partition checkpoints for rank 173
successfully loaded 8 ZeRO state_dicts for rank 494
successfully loaded 8 ZeRO state_dicts for rank 162
successfully loaded 8 ZeRO state_dicts for rank 135
successfully loaded 8 ZeRO state_dicts for rank 504
successfully loaded 8 ZeRO state_dicts for rank 407
successfully loaded 8 ZeRO state_dicts for rank 13
loading 8 zero partition checkpoints for rank 247
loading 8 zero partition checkpoints for rank 390
successfully loaded 8 ZeRO state_dicts for rank 319
loading 8 zero partition checkpoints for rank 377
loading 8 zero partition checkpoints for rank 64
loading 8 zero partition checkpoints for rank 351
successfully loaded 8 ZeRO state_dicts for rank 48
successfully loaded 8 ZeRO state_dicts for rank 306
loading 8 zero partition checkpoints for rank 412
loading 8 zero partition checkpoints for rank 195
loading 8 zero partition checkpoints for rank 369
loading 8 zero partition checkpoints for rank 439
loading 8 zero partition checkpoints for rank 121
loading 8 zero partition checkpoints for rank 343
successfully loaded 8 ZeRO state_dicts for rank 405
successfully loaded 8 ZeRO state_dicts for rank 106
loading 8 zero partition checkpoints for rank 154
loading 8 zero partition checkpoints for rank 396
loading 8 zero partition checkpoints for rank 167
loading 8 zero partition checkpoints for rank 231
loading 8 zero partition checkpoints for rank 352
loading 8 zero partition checkpoints for rank 238
successfully loaded 8 ZeRO state_dicts for rank 283
loading 8 zero partition checkpoints for rank 182
loading 8 zero partition checkpoints for rank 137
successfully loaded 8 ZeRO state_dicts for rank 346
successfully loaded 8 ZeRO state_dicts for rank 170
loading 8 zero partition checkpoints for rank 217
loading 8 zero partition checkpoints for rank 193
loading 8 zero partition checkpoints for rank 141
successfully loaded 8 ZeRO state_dicts for rank 33
loading 8 zero partition checkpoints for rank 263
successfully loaded 8 ZeRO state_dicts for rank 145
successfully loaded 8 ZeRO state_dicts for rank 473
successfully loaded 8 ZeRO state_dicts for rank 98
loading 8 zero partition checkpoints for rank 149
loading 8 zero partition checkpoints for rank 136
successfully loaded 8 ZeRO state_dicts for rank 360
loading 8 zero partition checkpoints for rank 177
successfully loaded 8 ZeRO state_dicts for rank 510
loading 8 zero partition checkpoints for rank 249
successfully loaded 8 ZeRO state_dicts for rank 490
successfully loaded 8 ZeRO state_dicts for rank 52
successfully loaded 8 ZeRO state_dicts for rank 325
successfully loaded 8 ZeRO state_dicts for rank 133
loading 8 zero partition checkpoints for rank 262
successfully loaded 8 ZeRO state_dicts for rank 506
loading 8 zero partition checkpoints for rank 395
successfully loaded 8 ZeRO state_dicts for rank 147
loading 8 zero partition checkpoints for rank 99
loading 8 zero partition checkpoints for rank 312
loading 8 zero partition checkpoints for rank 63
successfully loaded 8 ZeRO state_dicts for rank 458
loading 8 zero partition checkpoints for rank 127
successfully loaded 8 ZeRO state_dicts for rank 119
successfully loaded 8 ZeRO state_dicts for rank 134
loading 8 zero partition checkpoints for rank 164
loading 8 zero partition checkpoints for rank 205
loading 8 zero partition checkpoints for rank 212
loading 8 zero partition checkpoints for rank 288
loading 8 zero partition checkpoints for rank 335
loading 8 zero partition checkpoints for rank 383
loading 8 zero partition checkpoints for rank 189
loading 8 zero partition checkpoints for rank 166
successfully loaded 8 ZeRO state_dicts for rank 51
loading 8 zero partition checkpoints for rank 236
loading 8 zero partition checkpoints for rank 77
loading 8 zero partition checkpoints for rank 188
loading 8 zero partition checkpoints for rank 410
successfully loaded 8 ZeRO state_dicts for rank 55
loading 8 zero partition checkpoints for rank 148
loading 8 zero partition checkpoints for rank 417
loading 8 zero partition checkpoints for rank 105
successfully loaded 8 ZeRO state_dicts for rank 457
successfully loaded 8 ZeRO state_dicts for rank 2
successfully loaded 8 ZeRO state_dicts for rank 42
loading 8 zero partition checkpoints for rank 402
successfully loaded 8 ZeRO state_dicts for rank 500
loading 8 zero partition checkpoints for rank 434
successfully loaded 8 ZeRO state_dicts for rank 94
loading 8 zero partition checkpoints for rank 82
loading 8 zero partition checkpoints for rank 358
successfully loaded 8 ZeRO state_dicts for rank 491
loading 8 zero partition checkpoints for rank 61
successfully loaded 8 ZeRO state_dicts for rank 16
successfully loaded 8 ZeRO state_dicts for rank 58
loading 8 zero partition checkpoints for rank 65
loading 8 zero partition checkpoints for rank 349
loading 8 zero partition checkpoints for rank 295
successfully loaded 8 ZeRO state_dicts for rank 17
successfully loaded 8 ZeRO state_dicts for rank 464
loading 8 zero partition checkpoints for rank 404
loading 8 zero partition checkpoints for rank 363
successfully loaded 8 ZeRO state_dicts for rank 499
successfully loaded 8 ZeRO state_dicts for rank 461
successfully loaded 8 ZeRO state_dicts for rank 44
successfully loaded 8 ZeRO state_dicts for rank 50
successfully loaded 8 ZeRO state_dicts for rank 477
successfully loaded 8 ZeRO state_dicts for rank 45
loading 8 zero partition checkpoints for rank 159
successfully loaded 8 ZeRO state_dicts for rank 30
loading 8 zero partition checkpoints for rank 345
successfully loaded 8 ZeRO state_dicts for rank 53
loading 8 zero partition checkpoints for rank 353
successfully loaded 8 ZeRO state_dicts for rank 23
loading 8 zero partition checkpoints for rank 334
loading 8 zero partition checkpoints for rank 315
loading 8 zero partition checkpoints for rank 333
loading 8 zero partition checkpoints for rank 273
loading 8 zero partition checkpoints for rank 8
successfully loaded 8 ZeRO state_dicts for rank 503
successfully loaded 8 ZeRO state_dicts for rank 12
loading 8 zero partition checkpoints for rank 244
loading 8 zero partition checkpoints for rank 151
loading 8 zero partition checkpoints for rank 370
successfully loaded 8 ZeRO state_dicts for rank 59
successfully loaded 8 ZeRO state_dicts for rank 31
loading 8 zero partition checkpoints for rank 311
loading 8 zero partition checkpoints for rank 426
successfully loaded 8 ZeRO state_dicts for rank 486
loading 8 zero partition checkpoints for rank 399
successfully loaded 8 ZeRO state_dicts for rank 26
loading 8 zero partition checkpoints for rank 474
loading 8 zero partition checkpoints for rank 200
successfully loaded 8 ZeRO state_dicts for rank 54
loading 8 zero partition checkpoints for rank 101
successfully loaded 8 ZeRO state_dicts for rank 46
loading 8 zero partition checkpoints for rank 139
successfully loaded 8 ZeRO state_dicts for rank 498
successfully loaded 8 ZeRO state_dicts for rank 307
loading 8 zero partition checkpoints for rank 471
successfully loaded 8 ZeRO state_dicts for rank 469
successfully loaded 8 ZeRO state_dicts for rank 495
successfully loaded 8 ZeRO state_dicts for rank 22
loading 8 zero partition checkpoints for rank 104
loading 8 zero partition checkpoints for rank 272
successfully loaded 8 ZeRO state_dicts for rank 28
loading 8 zero partition checkpoints for rank 91
loading 8 zero partition checkpoints for rank 160
loading 8 zero partition checkpoints for rank 354
loading 8 zero partition checkpoints for rank 267
successfully loaded 8 ZeRO state_dicts for rank 467
loading 8 zero partition checkpoints for rank 317
loading 8 zero partition checkpoints for rank 361
loading 8 zero partition checkpoints for rank 281
loading 8 zero partition checkpoints for rank 73
loading 8 zero partition checkpoints for rank 103
loading 8 zero partition checkpoints for rank 107
successfully loaded 8 ZeRO state_dicts for rank 34
loading 8 zero partition checkpoints for rank 202
loading 8 zero partition checkpoints for rank 140
successfully loaded 8 ZeRO state_dicts for rank 14
loading 8 zero partition checkpoints for rank 255
successfully loaded 8 ZeRO state_dicts for rank 482
loading 8 zero partition checkpoints for rank 293
loading 8 zero partition checkpoints for rank 220
loading 8 zero partition checkpoints for rank 368
loading 8 zero partition checkpoints for rank 201
successfully loaded 8 ZeRO state_dicts for rank 483
loading 8 zero partition checkpoints for rank 269
loading 8 zero partition checkpoints for rank 355
loading 8 zero partition checkpoints for rank 168
loading 8 zero partition checkpoints for rank 427
loading 8 zero partition checkpoints for rank 318
loading 8 zero partition checkpoints for rank 284
loading 8 zero partition checkpoints for rank 122
loading 8 zero partition checkpoints for rank 93
loading 8 zero partition checkpoints for rank 418
loading 8 zero partition checkpoints for rank 191
loading 8 zero partition checkpoints for rank 203
loading 8 zero partition checkpoints for rank 359
loading 8 zero partition checkpoints for rank 291
loading 8 zero partition checkpoints for rank 207
loading 8 zero partition checkpoints for rank 268
loading 8 zero partition checkpoints for rank 316
loading 8 zero partition checkpoints for rank 187
loading 8 zero partition checkpoints for rank 371
loading 8 zero partition checkpoints for rank 131
loading 8 zero partition checkpoints for rank 97
successfully loaded 8 ZeRO state_dicts for rank 502
loading 8 zero partition checkpoints for rank 398
loading 8 zero partition checkpoints for rank 156
successfully loaded 8 ZeRO state_dicts for rank 18
successfully loaded 8 ZeRO state_dicts for rank 508
loading 8 zero partition checkpoints for rank 215
loading 8 zero partition checkpoints for rank 290
successfully loaded 8 ZeRO state_dicts for rank 497
successfully loaded 8 ZeRO state_dicts for rank 496
loading 8 zero partition checkpoints for rank 117
loading 8 zero partition checkpoints for rank 138
successfully loaded 8 ZeRO state_dicts for rank 493
loading 8 zero partition checkpoints for rank 79
loading 8 zero partition checkpoints for rank 181
loading 8 zero partition checkpoints for rank 209
successfully loaded 8 ZeRO state_dicts for rank 488
successfully loaded 8 ZeRO state_dicts for rank 485
loading 8 zero partition checkpoints for rank 89
loading 8 zero partition checkpoints for rank 157
successfully loaded 8 ZeRO state_dicts for rank 489
successfully loaded 8 ZeRO state_dicts for rank 501
loading 8 zero partition checkpoints for rank 176
loading 8 zero partition checkpoints for rank 468
loading 8 zero partition checkpoints for rank 143
loading 8 zero partition checkpoints for rank 100
loading 8 zero partition checkpoints for rank 223
loading 8 zero partition checkpoints for rank 87
loading 8 zero partition checkpoints for rank 74
loading 8 zero partition checkpoints for rank 258
successfully loaded 8 ZeRO state_dicts for rank 480
loading 8 zero partition checkpoints for rank 406
loading 8 zero partition checkpoints for rank 183
loading 8 zero partition checkpoints for rank 190
loading 8 zero partition checkpoints for rank 275
loading 8 zero partition checkpoints for rank 71
loading 8 zero partition checkpoints for rank 85
loading 8 zero partition checkpoints for rank 329
successfully loaded 8 ZeRO state_dicts for rank 511
loading 8 zero partition checkpoints for rank 72
successfully loaded 8 ZeRO state_dicts for rank 492
loading 8 zero partition checkpoints for rank 211
loading 8 zero partition checkpoints for rank 357
loading 8 zero partition checkpoints for rank 321
loading 8 zero partition checkpoints for rank 322
loading 8 zero partition checkpoints for rank 118
loading 8 zero partition checkpoints for rank 113
loading 8 zero partition checkpoints for rank 142
loading 8 zero partition checkpoints for rank 213
loading 8 zero partition checkpoints for rank 478
loading 8 zero partition checkpoints for rank 460
successfully loaded 8 ZeRO state_dicts for rank 509
loading 8 zero partition checkpoints for rank 186
loading 8 zero partition checkpoints for rank 253
loading 8 zero partition checkpoints for rank 228
loading 8 zero partition checkpoints for rank 419
successfully loaded 8 ZeRO state_dicts for rank 505
loading 8 zero partition checkpoints for rank 266
loading 8 zero partition checkpoints for rank 413
loading 8 zero partition checkpoints for rank 254
loading 8 zero partition checkpoints for rank 470
loading 8 zero partition checkpoints for rank 9
loading 8 zero partition checkpoints for rank 11
loading 8 zero partition checkpoints for rank 184
loading 8 zero partition checkpoints for rank 274
loading 8 zero partition checkpoints for rank 324
loading 8 zero partition checkpoints for rank 314
loading 8 zero partition checkpoints for rank 362
loading 8 zero partition checkpoints for rank 294
loading 8 zero partition checkpoints for rank 90
loading 8 zero partition checkpoints for rank 409
loading 8 zero partition checkpoints for rank 41
loading 8 zero partition checkpoints for rank 450
loading 8 zero partition checkpoints for rank 448
loading 8 zero partition checkpoints for rank 259
loading 8 zero partition checkpoints for rank 179
loading 8 zero partition checkpoints for rank 270
loading 8 zero partition checkpoints for rank 356
loading 8 zero partition checkpoints for rank 165
successfully loaded 8 ZeRO state_dicts for rank 465
loading 8 zero partition checkpoints for rank 214
loading 8 zero partition checkpoints for rank 221
loading 8 zero partition checkpoints for rank 83
loading 8 zero partition checkpoints for rank 76
loading 8 zero partition checkpoints for rank 414
loading 8 zero partition checkpoints for rank 95
loading 8 zero partition checkpoints for rank 114
successfully loaded 8 ZeRO state_dicts for rank 110
loading 8 zero partition checkpoints for rank 37
loading 8 zero partition checkpoints for rank 116
loading 8 zero partition checkpoints for rank 10
loading 8 zero partition checkpoints for rank 411
successfully loaded 8 ZeRO state_dicts for rank 481
loading 8 zero partition checkpoints for rank 331
loading 8 zero partition checkpoints for rank 397
loading 8 zero partition checkpoints for rank 222
loading 8 zero partition checkpoints for rank 230
loading 8 zero partition checkpoints for rank 326
loading 8 zero partition checkpoints for rank 102
loading 8 zero partition checkpoints for rank 286
loading 8 zero partition checkpoints for rank 75
loading 8 zero partition checkpoints for rank 287
loading 8 zero partition checkpoints for rank 453
loading 8 zero partition checkpoints for rank 163
loading 8 zero partition checkpoints for rank 280
loading 8 zero partition checkpoints for rank 305
loading 8 zero partition checkpoints for rank 271
loading 8 zero partition checkpoints for rank 62
loading 8 zero partition checkpoints for rank 78
loading 8 zero partition checkpoints for rank 144
loading 8 zero partition checkpoints for rank 282
loading 8 zero partition checkpoints for rank 310
loading 8 zero partition checkpoints for rank 456
loading 8 zero partition checkpoints for rank 308
loading 8 zero partition checkpoints for rank 92
loading 8 zero partition checkpoints for rank 66
loading 8 zero partition checkpoints for rank 161
loading 8 zero partition checkpoints for rank 47
loading 8 zero partition checkpoints for rank 472
loading 8 zero partition checkpoints for rank 43
loading 8 zero partition checkpoints for rank 350
loading 8 zero partition checkpoints for rank 372
loading 8 zero partition checkpoints for rank 35
loading 8 zero partition checkpoints for rank 130
loading 8 zero partition checkpoints for rank 70
loading 8 zero partition checkpoints for rank 60
loading 8 zero partition checkpoints for rank 1
loading 8 zero partition checkpoints for rank 38
loading 8 zero partition checkpoints for rank 374
loading 8 zero partition checkpoints for rank 29
loading 8 zero partition checkpoints for rank 407
loading 8 zero partition checkpoints for rank 210
loading 8 zero partition checkpoints for rank 67
loading 8 zero partition checkpoints for rank 171
loading 8 zero partition checkpoints for rank 80
loading 8 zero partition checkpoints for rank 449
loading 8 zero partition checkpoints for rank 106
loading 8 zero partition checkpoints for rank 81
loading 8 zero partition checkpoints for rank 347
loading 8 zero partition checkpoints for rank 479
loading 8 zero partition checkpoints for rank 405
loading 8 zero partition checkpoints for rank 346
loading 8 zero partition checkpoints for rank 98
loading 8 zero partition checkpoints for rank 283
loading 8 zero partition checkpoints for rank 264
loading 8 zero partition checkpoints for rank 415
loading 8 zero partition checkpoints for rank 68
loading 8 zero partition checkpoints for rank 475
loading 8 zero partition checkpoints for rank 40
loading 8 zero partition checkpoints for rank 145
loading 8 zero partition checkpoints for rank 27
loading 8 zero partition checkpoints for rank 24
loading 8 zero partition checkpoints for rank 162
loading 8 zero partition checkpoints for rank 459
loading 8 zero partition checkpoints for rank 134
loading 8 zero partition checkpoints for rank 25
loading 8 zero partition checkpoints for rank 56
loading 8 zero partition checkpoints for rank 109
loading 8 zero partition checkpoints for rank 303
loading 8 zero partition checkpoints for rank 119
loading 8 zero partition checkpoints for rank 115
loading 8 zero partition checkpoints for rank 319
loading 8 zero partition checkpoints for rank 185
loading 8 zero partition checkpoints for rank 208
loading 8 zero partition checkpoints for rank 13
loading 8 zero partition checkpoints for rank 476
loading 8 zero partition checkpoints for rank 375
loading 8 zero partition checkpoints for rank 348
loading 8 zero partition checkpoints for rank 57
loading 8 zero partition checkpoints for rank 360
loading 8 zero partition checkpoints for rank 33
loading 8 zero partition checkpoints for rank 15
loading 8 zero partition checkpoints for rank 328
loading 8 zero partition checkpoints for rank 330
loading 8 zero partition checkpoints for rank 129
loading 8 zero partition checkpoints for rank 323
loading 8 zero partition checkpoints for rank 327
loading 8 zero partition checkpoints for rank 21
loading 8 zero partition checkpoints for rank 487
loading 8 zero partition checkpoints for rank 112
loading 8 zero partition checkpoints for rank 373
loading 8 zero partition checkpoints for rank 506
loading 8 zero partition checkpoints for rank 504
loading 8 zero partition checkpoints for rank 48
loading 8 zero partition checkpoints for rank 510
loading 8 zero partition checkpoints for rank 301
loading 8 zero partition checkpoints for rank 344
loading 8 zero partition checkpoints for rank 42
loading 8 zero partition checkpoints for rank 2
loading 8 zero partition checkpoints for rank 300
loading 8 zero partition checkpoints for rank 320
loading 8 zero partition checkpoints for rank 51
loading 8 zero partition checkpoints for rank 20
loading 8 zero partition checkpoints for rank 462
loading 8 zero partition checkpoints for rank 36
loading 8 zero partition checkpoints for rank 304
loading 8 zero partition checkpoints for rank 500
loading 8 zero partition checkpoints for rank 473
loading 8 zero partition checkpoints for rank 461
loading 8 zero partition checkpoints for rank 307
loading 8 zero partition checkpoints for rank 491
loading 8 zero partition checkpoints for rank 451
loading 8 zero partition checkpoints for rank 45
loading 8 zero partition checkpoints for rank 325
loading 8 zero partition checkpoints for rank 507
loading 8 zero partition checkpoints for rank 23
loading 8 zero partition checkpoints for rank 44
loading 8 zero partition checkpoints for rank 32
loading 8 zero partition checkpoints for rank 52
loading 8 zero partition checkpoints for rank 30
loading 8 zero partition checkpoints for rank 53
loading 8 zero partition checkpoints for rank 477
loading 8 zero partition checkpoints for rank 94
loading 8 zero partition checkpoints for rank 58
loading 8 zero partition checkpoints for rank 31
loading 8 zero partition checkpoints for rank 59
loading 8 zero partition checkpoints for rank 39
loading 8 zero partition checkpoints for rank 26
loading 8 zero partition checkpoints for rank 146
loading 8 zero partition checkpoints for rank 452
loading 8 zero partition checkpoints for rank 302
loading 8 zero partition checkpoints for rank 28
loading 8 zero partition checkpoints for rank 17
loading 8 zero partition checkpoints for rank 108
loading 8 zero partition checkpoints for rank 469
loading 8 zero partition checkpoints for rank 169
loading 8 zero partition checkpoints for rank 46
loading 8 zero partition checkpoints for rank 306
loading 8 zero partition checkpoints for rank 490
loading 8 zero partition checkpoints for rank 22
loading 8 zero partition checkpoints for rank 55
loading 8 zero partition checkpoints for rank 464
loading 8 zero partition checkpoints for rank 457
loading 8 zero partition checkpoints for rank 463
loading 8 zero partition checkpoints for rank 458
loading 8 zero partition checkpoints for rank 170
loading 8 zero partition checkpoints for rank 34
loading 8 zero partition checkpoints for rank 147
loading 8 zero partition checkpoints for rank 488
loading 8 zero partition checkpoints for rank 18
loading 8 zero partition checkpoints for rank 485
loading 8 zero partition checkpoints for rank 111
loading 8 zero partition checkpoints for rank 501
loading 8 zero partition checkpoints for rank 493
loading 8 zero partition checkpoints for rank 455
loading 8 zero partition checkpoints for rank 225
loading 8 zero partition checkpoints for rank 12
loading 8 zero partition checkpoints for rank 467
loading 8 zero partition checkpoints for rank 49
loading 8 zero partition checkpoints for rank 14
loading 8 zero partition checkpoints for rank 492
loading 8 zero partition checkpoints for rank 3
loading 8 zero partition checkpoints for rank 54
loading 8 zero partition checkpoints for rank 454
loading 8 zero partition checkpoints for rank 227
loading 8 zero partition checkpoints for rank 19
loading 8 zero partition checkpoints for rank 509
loading 8 zero partition checkpoints for rank 489
loading 8 zero partition checkpoints for rank 0
loading 8 zero partition checkpoints for rank 226
 checkpoint version 3.0
loading 8 zero partition checkpoints for rank 466
loading 8 zero partition checkpoints for rank 499
loading 8 zero partition checkpoints for rank 484
loading 8 zero partition checkpoints for rank 224
loading 8 zero partition checkpoints for rank 135
loading 8 zero partition checkpoints for rank 50
loading 8 zero partition checkpoints for rank 110
loading 8 zero partition checkpoints for rank 505
loading 8 zero partition checkpoints for rank 497
loading 8 zero partition checkpoints for rank 496
loading 8 zero partition checkpoints for rank 498
loading 8 zero partition checkpoints for rank 494
loading 8 zero partition checkpoints for rank 133
loading 8 zero partition checkpoints for rank 486
loading 8 zero partition checkpoints for rank 16
loading 8 zero partition checkpoints for rank 495
loading 8 zero partition checkpoints for rank 503
loading 8 zero partition checkpoints for rank 465
loading 8 zero partition checkpoints for rank 502
loading 8 zero partition checkpoints for rank 508
loading 8 zero partition checkpoints for rank 511
loading 8 zero partition checkpoints for rank 480
successfully loaded 8 ZeRO state_dicts for rank 5
loading 8 zero partition checkpoints for rank 482
loading 8 zero partition checkpoints for rank 483
loading 8 zero partition checkpoints for rank 481
successfully loaded 8 ZeRO state_dicts for rank 6
successfully loaded 8 ZeRO state_dicts for rank 4
successfully loaded 8 ZeRO state_dicts for rank 7
loading 8 zero partition checkpoints for rank 5
loading 8 zero partition checkpoints for rank 4
loading 8 zero partition checkpoints for rank 6
loading 8 zero partition checkpoints for rank 7
  successfully loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints at iteration 9768
time (ms) | load-checkpoint: 91243.56
[after model, optimizer, and learning rate scheduler are built] datetime: 2021-09-27 03:56:36 
> building train, validation, and test datasets ...
 > datasets target sizes (minimum size):
    train:      300000000
    validation: 1638400
    test:       10240
> building train, validation, and test datasets for GPT ...
 > building dataset index ...
    reading sizes...
    reading pointers...
    reading document index...
    creating numpy buffer of mmap...
    creating memory view of numpy buffer...
 > finished creating indexed dataset in 0.143013 seconds
    number of documents: 304230423
 > dataset split:
    train:
     document indices in [0, 288714672) total of 288714672 documents
    validation:
     document indices in [288714672, 303926193) total of 15211521 documents
    test:
     document indices in [303926193, 304230423) total of 304230 documents
 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_300000000ns_2048sl_42s_doc_idx.npy
 > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_300000000ns_2048sl_42s_sample_idx.npy
 > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_300000000ns_2048sl_42s_shuffle_idx.npy
    loaded indexed file in 0.289 seconds
    total number of samples: 394611670
    total number of epochs: 3
 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_1638400ns_2048sl_42s_doc_idx.npy
 > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_1638400ns_2048sl_42s_sample_idx.npy
 > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_1638400ns_2048sl_42s_shuffle_idx.npy
    loaded indexed file in 0.388 seconds
    total number of samples: 6927161
    total number of epochs: 1
 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_42s_doc_idx.npy
 > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_42s_sample_idx.npy
 > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_42s_shuffle_idx.npy
    loaded indexed file in 0.061 seconds
    total number of samples: 137384
    total number of epochs: 1
> finished creating GPT datasets ...
[after dataloaders are built] datetime: 2021-09-27 03:56:43 
done with setup ...
training ...
time (ms) | model-and-optimizer-setup: 102057.80 | train/valid/test-data-iterators-setup: 5731.66
[before the start of training step] datetime: 2021-09-27 03:56:43 
[2021-09-27 03:56:43,457] [INFO] [checkpointing.py:408:forward] Activation Checkpointing Information
[2021-09-27 03:56:43,457] [INFO] [checkpointing.py:409:forward] ----Partition Activations False, CPU CHECKPOINTING False
[2021-09-27 03:56:43,457] [INFO] [checkpointing.py:412:forward] ----contiguous Memory Checkpointing False with 32 total layers
[2021-09-27 03:56:43,457] [INFO] [checkpointing.py:415:forward] ----Synchronization False
[2021-09-27 03:56:43,457] [INFO] [checkpointing.py:416:forward] ----Profiling time in checkpointing False
[Rank 192] (after 9770 iterations) memory (MB) | allocated: 4613.21923828125 | max allocated: 10290.1357421875 | reserved: 15132.0 | max reserved: 15132.0
[Rank 129] (after 9770 iterations) memory (MB) | allocated: 4613.21923828125 | max allocated: 10562.13623046875 | reserved: 15500.0 | max reserved: 15500.0
[Rank 130] (after 9770 iterations) memory (MB) | allocated: 4613.21923828125 | max allocated: 10562.13623046875 | reserved: 15364.0 | max reserved: 15364.0
[Rank 64] (after 9770 iterations) memory (MB) | allocated: 4613.21923828125 | max allocated: 10834.13671875 | reserved: 15820.0 | max reserved: 15820.0
[Rank 0] (after 9770 iterations) memory (MB) | allocated: 5267.49951171875 | max allocated: 12476.68310546875 | reserved: 18256.0 | max reserved: 18256.0
[Rank 2] (after 9770 iterations) memory (MB) | allocated: 5267.49951171875 | max allocated: 12476.68310546875 | reserved: 17788.0 | max reserved: 17788.0
[Rank 256] (after 9770 iterations) memory (MB) | allocated: 4613.21923828125 | max allocated: 10018.13525390625 | reserved: 14812.0 | max reserved: 14812.0
[Rank 257] (after 9770 iterations) memory (MB) | allocated: 4613.21923828125 | max allocated: 10018.13525390625 | reserved: 14940.0 | max reserved: 14940.0
[Rank 193] (after 9770 iterations) memory (MB) | allocated: 4613.21923828125 | max allocated: 10290.1357421875 | reserved: 15096.0 | max reserved: 15096.0[Rank 194] (after 9770 iterations) memory (MB) | allocated: 4613.21923828125 | max allocated: 10290.1357421875 | reserved: 15112.0 | max reserved: 15112.0
[Rank 128] (after 9770 iterations) memory (MB) | allocated: 4613.21923828125 | max allocated: 10562.13623046875 | reserved: 15456.0 | max reserved: 15456.0
[Rank 385] (after 9770 iterations) memory (MB) | allocated: 4613.21923828125 | max allocated: 9474.13427734375 | reserved: 14312.0 | max reserved: 14312.0
[Rank 320] (after 9770 iterations) memory (MB) | allocated: 4613.21923828125 | max allocated: 9746.134765625 | reserved: 14716.0 | max reserved: 14716.0
[Rank 65] (after 9770 iterations) memory (MB) | allocated: 4613.21923828125 | max allocated: 10834.13671875 | reserved: 15632.0 | max reserved: 15632.0
[Rank 1] (after 9770 iterations) memory (MB) | allocated: 5267.49951171875 | max allocated: 12476.68310546875 | reserved: 18256.0 | max reserved: 18256.0
[Rank 258] (after 9770 iterations) memory (MB) | allocated: 4613.21923828125 | max allocated: 10018.13525390625 | reserved: 14696.0 | max reserved: 14696.0

[Rank 131] (after 9770 iterations) memory (MB) | allocated: 4613.21923828125 | max allocated: 10562.13623046875 | reserved: 15532.0 | max reserved: 15532.0
[Rank 384] (after 9770 iterations) memory (MB) | allocated: 4613.21923828125 | max allocated: 9474.13427734375 | reserved: 14268.0 | max reserved: 14268.0
[Rank 449] (after 9770 iterations) memory (MB) | allocated: 5685.35986328125 | max allocated: 10463.337890625 | reserved: 15736.0 | max reserved: 15736.0[Rank 448] (after 9770 iterations) memory (MB) | allocated: 5685.35986328125 | max allocated: 10463.33642578125 | reserved: 15736.0 | max reserved: 15736.0

[Rank 322] (after 9770 iterations) memory (MB) | allocated: 4613.21923828125 | max allocated: 9746.134765625 | reserved: 14616.0 | max reserved: 14616.0
[Rank 66] (after 9770 iterations) memory (MB) | allocated: 4613.21923828125 | max allocated: 10834.13671875 | reserved: 15828.0 | max reserved: 15828.0
[Rank 3] (after 9770 iterations) memory (MB) | allocated: 5267.49951171875 | max allocated: 12476.68310546875 | reserved: 18256.0 | max reserved: 18256.0
[Rank 259] (after 9770 iterations) memory (MB) | allocated: 4613.21923828125 | max allocated: 10018.13525390625 | reserved: 14712.0 | max reserved: 14712.0
[Rank 195] (after 9770 iterations) memory (MB) | allocated: 4613.21923828125 | max allocated: 10290.1357421875 | reserved: 15208.0 | max reserved: 15208.0
[Rank 387] (after 9770 iterations) memory (MB) | allocated: 4613.21923828125 | max allocated: 9474.13427734375 | reserved: 14312.0 | max reserved: 14312.0[Rank 386] (after 9770 iterations) memory (MB) | allocated: 4613.21923828125 | max allocated: 9474.13427734375 | reserved: 14312.0 | max reserved: 14312.0
[Rank 451] (after 9770 iterations) memory (MB) | allocated: 5685.35986328125 | max allocated: 10463.3369140625 | reserved: 15736.0 | max reserved: 15736.0
[Rank 323] (after 9770 iterations) memory (MB) | allocated: 4613.21923828125 | max allocated: 9746.134765625 | reserved: 14648.0 | max reserved: 14648.0
[Rank 67] (after 9770 iterations) memory (MB) | allocated: 4613.21923828125 | max allocated: 10834.13671875 | reserved: 15536.0 | max reserved: 15536.0

[Rank 450] (after 9770 iterations) memory (MB) | allocated: 5685.35986328125 | max allocated: 10463.33544921875 | reserved: 15736.0 | max reserved: 15736.0
[Rank 321] (after 9770 iterations) memory (MB) | allocated: 4613.21923828125 | max allocated: 9746.134765625 | reserved: 14684.0 | max reserved: 14684.0
 iteration     9770/  159576 | consumed samples:       701760 | elapsed time per iteration (ms): 21146.4 | learning rate: 6.000E-05 | global batch size:   240 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9780/  159576 | consumed samples:       704160 | elapsed time per iteration (ms): 13340.2 | learning rate: 6.000E-05 | global batch size:   240 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9790/  159576 | consumed samples:       706560 | elapsed time per iteration (ms): 13419.1 | learning rate: 6.000E-05 | global batch size:   240 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9800/  159576 | consumed samples:       708976 | elapsed time per iteration (ms): 13591.3 | learning rate: 6.000E-05 | global batch size:   256 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9810/  159576 | consumed samples:       711536 | elapsed time per iteration (ms): 13986.8 | learning rate: 6.000E-05 | global batch size:   256 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9820/  159576 | consumed samples:       714096 | elapsed time per iteration (ms): 14105.8 | learning rate: 6.000E-05 | global batch size:   256 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9830/  159576 | consumed samples:       716656 | elapsed time per iteration (ms): 14030.2 | learning rate: 6.000E-05 | global batch size:   256 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9840/  159576 | consumed samples:       719216 | elapsed time per iteration (ms): 14188.9 | learning rate: 6.000E-05 | global batch size:   256 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-27 04:14:28] PULSE: tr8-104B is running for 20:12 since 2021-09-27T03:54:16 (1188168 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r6i7n[7-8],r7i0n[0-5],r7i1n[7-8],r7i2n[0-1,5,8],r7i3n2,r7i5n7,r7i6n[1-4,8],r7i7n[0-4,6-8],r8i0n[0-8],r8i1n[0-4],r8i2n8,r8i3n[0-3,8],r8i4n[0-1],r8i6n[2-3,5-6],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n[0,3-8],r9i3n[0-2,6-8],r9i4n[0-6,8],r9i5n[0-8],r9i6n[0-8],r9i7n[1-8])
 iteration     9850/  159576 | consumed samples:       721776 | elapsed time per iteration (ms): 14071.1 | learning rate: 6.000E-05 | global batch size:   256 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9860/  159576 | consumed samples:       724336 | elapsed time per iteration (ms): 14125.1 | learning rate: 6.000E-05 | global batch size:   256 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9870/  159576 | consumed samples:       726896 | elapsed time per iteration (ms): 14170.2 | learning rate: 6.000E-05 | global batch size:   256 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9880/  159576 | consumed samples:       729456 | elapsed time per iteration (ms): 14139.5 | learning rate: 6.000E-05 | global batch size:   256 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9890/  159576 | consumed samples:       732016 | elapsed time per iteration (ms): 14156.0 | learning rate: 6.000E-05 | global batch size:   256 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9900/  159576 | consumed samples:       734576 | elapsed time per iteration (ms): 14057.9 | learning rate: 6.000E-05 | global batch size:   256 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9910/  159576 | consumed samples:       737136 | elapsed time per iteration (ms): 14129.8 | learning rate: 6.000E-05 | global batch size:   256 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9920/  159576 | consumed samples:       739696 | elapsed time per iteration (ms): 14157.7 | learning rate: 6.000E-05 | global batch size:   256 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9930/  159576 | consumed samples:       742256 | elapsed time per iteration (ms): 14024.1 | learning rate: 6.000E-05 | global batch size:   256 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9940/  159576 | consumed samples:       744816 | elapsed time per iteration (ms): 13971.4 | learning rate: 6.000E-05 | global batch size:   256 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9950/  159576 | consumed samples:       747376 | elapsed time per iteration (ms): 14101.5 | learning rate: 6.000E-05 | global batch size:   256 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9960/  159576 | consumed samples:       749936 | elapsed time per iteration (ms): 14210.0 | learning rate: 6.000E-05 | global batch size:   256 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9970/  159576 | consumed samples:       752496 | elapsed time per iteration (ms): 14219.6 | learning rate: 6.000E-05 | global batch size:   256 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9980/  159576 | consumed samples:       755056 | elapsed time per iteration (ms): 14117.6 | learning rate: 6.000E-05 | global batch size:   256 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9990/  159576 | consumed samples:       757712 | elapsed time per iteration (ms): 14400.0 | learning rate: 6.000E-05 | global batch size:   272 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-27 04:51:19,357] [INFO] [logging.py:68:log_dist] [Rank 0] step=10000, skipped=1052, lr=[5.999919375575235e-05, 5.999919375575235e-05], mom=[(0.9, 0.999), (0.9, 0.999)]
steps: 10000 loss: nan iter time (s): 0.007 samples/sec: 37472.688
 iteration    10000/  159576 | consumed samples:       760432 | elapsed time per iteration (ms): 14648.0 | learning rate: 6.000E-05 | global batch size:   272 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
-------------------------------------------------------------------------------------------------
 validation loss at iteration 10000 | lm loss value: 7.270623E+00 | lm loss PPL: 1.437445E+03 | 
-------------------------------------------------------------------------------------------------
 iteration    10010/  159576 | consumed samples:       763152 | elapsed time per iteration (ms): 16469.3 | learning rate: 6.000E-05 | global batch size:   272 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10020/  159576 | consumed samples:       765872 | elapsed time per iteration (ms): 14573.2 | learning rate: 6.000E-05 | global batch size:   272 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10030/  159576 | consumed samples:       768592 | elapsed time per iteration (ms): 14611.8 | learning rate: 6.000E-05 | global batch size:   272 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10040/  159576 | consumed samples:       771312 | elapsed time per iteration (ms): 14782.8 | learning rate: 6.000E-05 | global batch size:   272 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10050/  159576 | consumed samples:       774032 | elapsed time per iteration (ms): 14722.8 | learning rate: 6.000E-05 | global batch size:   272 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10060/  159576 | consumed samples:       776752 | elapsed time per iteration (ms): 14595.9 | learning rate: 6.000E-05 | global batch size:   272 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10070/  159576 | consumed samples:       779472 | elapsed time per iteration (ms): 14712.5 | learning rate: 6.000E-05 | global batch size:   272 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10080/  159576 | consumed samples:       782192 | elapsed time per iteration (ms): 14640.3 | learning rate: 6.000E-05 | global batch size:   272 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10090/  159576 | consumed samples:       784912 | elapsed time per iteration (ms): 15060.9 | learning rate: 6.000E-05 | global batch size:   272 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-27 05:14:32] PULSE: tr8-104B is running for 1:20:16 since 2021-09-27T03:54:16 (1188168 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r6i7n[7-8],r7i0n[0-5],r7i1n[7-8],r7i2n[0-1,5,8],r7i3n2,r7i5n7,r7i6n[1-4,8],r7i7n[0-4,6-8],r8i0n[0-8],r8i1n[0-4],r8i2n8,r8i3n[0-3,8],r8i4n[0-1],r8i6n[2-3,5-6],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n[0,3-8],r9i3n[0-2,6-8],r9i4n[0-6,8],r9i5n[0-8],r9i6n[0-8],r9i7n[1-8])
 iteration    10100/  159576 | consumed samples:       787632 | elapsed time per iteration (ms): 14624.0 | learning rate: 6.000E-05 | global batch size:   272 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10110/  159576 | consumed samples:       790352 | elapsed time per iteration (ms): 14621.7 | learning rate: 6.000E-05 | global batch size:   272 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10120/  159576 | consumed samples:       793072 | elapsed time per iteration (ms): 14685.1 | learning rate: 6.000E-05 | global batch size:   272 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10130/  159576 | consumed samples:       795792 | elapsed time per iteration (ms): 14531.8 | learning rate: 6.000E-05 | global batch size:   272 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10140/  159576 | consumed samples:       798512 | elapsed time per iteration (ms): 14629.6 | learning rate: 6.000E-05 | global batch size:   272 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10150/  159576 | consumed samples:       801232 | elapsed time per iteration (ms): 14771.8 | learning rate: 6.000E-05 | global batch size:   272 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10160/  159576 | consumed samples:       803984 | elapsed time per iteration (ms): 14889.9 | learning rate: 6.000E-05 | global batch size:   288 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10170/  159576 | consumed samples:       806864 | elapsed time per iteration (ms): 15471.9 | learning rate: 6.000E-05 | global batch size:   288 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10180/  159576 | consumed samples:       809744 | elapsed time per iteration (ms): 15228.6 | learning rate: 6.000E-05 | global batch size:   288 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10190/  159576 | consumed samples:       812624 | elapsed time per iteration (ms): 15425.1 | learning rate: 6.000E-05 | global batch size:   288 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10200/  159576 | consumed samples:       815504 | elapsed time per iteration (ms): 15390.8 | learning rate: 6.000E-05 | global batch size:   288 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10210/  159576 | consumed samples:       818384 | elapsed time per iteration (ms): 15293.9 | learning rate: 6.000E-05 | global batch size:   288 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10220/  159576 | consumed samples:       821264 | elapsed time per iteration (ms): 15259.9 | learning rate: 6.000E-05 | global batch size:   288 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10230/  159576 | consumed samples:       824144 | elapsed time per iteration (ms): 15547.4 | learning rate: 6.000E-05 | global batch size:   288 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10240/  159576 | consumed samples:       827024 | elapsed time per iteration (ms): 15375.5 | learning rate: 6.000E-05 | global batch size:   288 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10250/  159576 | consumed samples:       829904 | elapsed time per iteration (ms): 15322.8 | learning rate: 6.000E-05 | global batch size:   288 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10260/  159576 | consumed samples:       832784 | elapsed time per iteration (ms): 15280.3 | learning rate: 6.000E-05 | global batch size:   288 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10270/  159576 | consumed samples:       835664 | elapsed time per iteration (ms): 15390.4 | learning rate: 6.000E-05 | global batch size:   288 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10280/  159576 | consumed samples:       838544 | elapsed time per iteration (ms): 15339.6 | learning rate: 6.000E-05 | global batch size:   288 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10290/  159576 | consumed samples:       841424 | elapsed time per iteration (ms): 15252.5 | learning rate: 6.000E-05 | global batch size:   288 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10300/  159576 | consumed samples:       844304 | elapsed time per iteration (ms): 15146.5 | learning rate: 6.000E-05 | global batch size:   288 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10310/  159576 | consumed samples:       847184 | elapsed time per iteration (ms): 15389.7 | learning rate: 6.000E-05 | global batch size:   288 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10320/  159576 | consumed samples:       850064 | elapsed time per iteration (ms): 15348.5 | learning rate: 6.000E-05 | global batch size:   288 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10330/  159576 | consumed samples:       853072 | elapsed time per iteration (ms): 15779.0 | learning rate: 6.000E-05 | global batch size:   304 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-27 06:14:35] PULSE: tr8-104B is running for 2:20:19 since 2021-09-27T03:54:16 (1188168 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r6i7n[7-8],r7i0n[0-5],r7i1n[7-8],r7i2n[0-1,5,8],r7i3n2,r7i5n7,r7i6n[1-4,8],r7i7n[0-4,6-8],r8i0n[0-8],r8i1n[0-4],r8i2n8,r8i3n[0-3,8],r8i4n[0-1],r8i6n[2-3,5-6],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n[0,3-8],r9i3n[0-2,6-8],r9i4n[0-6,8],r9i5n[0-8],r9i6n[0-8],r9i7n[1-8])
 iteration    10340/  159576 | consumed samples:       856112 | elapsed time per iteration (ms): 15864.8 | learning rate: 6.000E-05 | global batch size:   304 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10350/  159576 | consumed samples:       859152 | elapsed time per iteration (ms): 15831.6 | learning rate: 6.000E-05 | global batch size:   304 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10360/  159576 | consumed samples:       862192 | elapsed time per iteration (ms): 15954.9 | learning rate: 6.000E-05 | global batch size:   304 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10370/  159576 | consumed samples:       865232 | elapsed time per iteration (ms): 15871.6 | learning rate: 6.000E-05 | global batch size:   304 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10380/  159576 | consumed samples:       868272 | elapsed time per iteration (ms): 15850.1 | learning rate: 6.000E-05 | global batch size:   304 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10390/  159576 | consumed samples:       871312 | elapsed time per iteration (ms): 15796.9 | learning rate: 6.000E-05 | global batch size:   304 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10400/  159576 | consumed samples:       874352 | elapsed time per iteration (ms): 16082.6 | learning rate: 6.000E-05 | global batch size:   304 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10410/  159576 | consumed samples:       877392 | elapsed time per iteration (ms): 16036.3 | learning rate: 6.000E-05 | global batch size:   304 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10420/  159576 | consumed samples:       880432 | elapsed time per iteration (ms): 15898.1 | learning rate: 6.000E-05 | global batch size:   304 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10430/  159576 | consumed samples:       883472 | elapsed time per iteration (ms): 15687.4 | learning rate: 6.000E-05 | global batch size:   304 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10440/  159576 | consumed samples:       886512 | elapsed time per iteration (ms): 15579.4 | learning rate: 6.000E-05 | global batch size:   304 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10450/  159576 | consumed samples:       889552 | elapsed time per iteration (ms): 16071.4 | learning rate: 6.000E-05 | global batch size:   304 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10460/  159576 | consumed samples:       892592 | elapsed time per iteration (ms): 15986.9 | learning rate: 6.000E-05 | global batch size:   304 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10470/  159576 | consumed samples:       895632 | elapsed time per iteration (ms): 15775.6 | learning rate: 6.000E-05 | global batch size:   304 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10480/  159576 | consumed samples:       898720 | elapsed time per iteration (ms): 16164.1 | learning rate: 6.000E-05 | global batch size:   320 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10490/  159576 | consumed samples:       901920 | elapsed time per iteration (ms): 16520.7 | learning rate: 6.000E-05 | global batch size:   320 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10500/  159576 | consumed samples:       905120 | elapsed time per iteration (ms): 16597.6 | learning rate: 6.000E-05 | global batch size:   320 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
saving checkpoint at iteration   10500 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints
[2021-09-27 06:59:42,258] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/global_step10500/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration   10500 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints
time (ms) | save-checkpoint: 21886.11
 iteration    10510/  159576 | consumed samples:       908320 | elapsed time per iteration (ms): 18676.6 | learning rate: 6.000E-05 | global batch size:   320 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10520/  159576 | consumed samples:       911520 | elapsed time per iteration (ms): 16429.2 | learning rate: 6.000E-05 | global batch size:   320 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10530/  159576 | consumed samples:       914720 | elapsed time per iteration (ms): 16551.8 | learning rate: 6.000E-05 | global batch size:   320 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10540/  159576 | consumed samples:       917920 | elapsed time per iteration (ms): 16488.6 | learning rate: 6.000E-05 | global batch size:   320 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10550/  159576 | consumed samples:       921120 | elapsed time per iteration (ms): 16385.6 | learning rate: 6.000E-05 | global batch size:   320 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-27 07:14:45] PULSE: tr8-104B is running for 3:20:29 since 2021-09-27T03:54:16 (1188168 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r6i7n[7-8],r7i0n[0-5],r7i1n[7-8],r7i2n[0-1,5,8],r7i3n2,r7i5n7,r7i6n[1-4,8],r7i7n[0-4,6-8],r8i0n[0-8],r8i1n[0-4],r8i2n8,r8i3n[0-3,8],r8i4n[0-1],r8i6n[2-3,5-6],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n[0,3-8],r9i3n[0-2,6-8],r9i4n[0-6,8],r9i5n[0-8],r9i6n[0-8],r9i7n[1-8])
 iteration    10560/  159576 | consumed samples:       924320 | elapsed time per iteration (ms): 16352.3 | learning rate: 6.000E-05 | global batch size:   320 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10570/  159576 | consumed samples:       927520 | elapsed time per iteration (ms): 16281.1 | learning rate: 6.000E-05 | global batch size:   320 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10580/  159576 | consumed samples:       930720 | elapsed time per iteration (ms): 16433.2 | learning rate: 6.000E-05 | global batch size:   320 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10590/  159576 | consumed samples:       933920 | elapsed time per iteration (ms): 16276.4 | learning rate: 6.000E-05 | global batch size:   320 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10600/  159576 | consumed samples:       937120 | elapsed time per iteration (ms): 16510.6 | learning rate: 6.000E-05 | global batch size:   320 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10610/  159576 | consumed samples:       940320 | elapsed time per iteration (ms): 16415.6 | learning rate: 6.000E-05 | global batch size:   320 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10620/  159576 | consumed samples:       943520 | elapsed time per iteration (ms): 16211.4 | learning rate: 6.000E-05 | global batch size:   320 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10630/  159576 | consumed samples:       946800 | elapsed time per iteration (ms): 16664.6 | learning rate: 6.000E-05 | global batch size:   336 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10640/  159576 | consumed samples:       950160 | elapsed time per iteration (ms): 17041.3 | learning rate: 6.000E-05 | global batch size:   336 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10650/  159576 | consumed samples:       953520 | elapsed time per iteration (ms): 17363.3 | learning rate: 6.000E-05 | global batch size:   336 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10660/  159576 | consumed samples:       956880 | elapsed time per iteration (ms): 16944.5 | learning rate: 6.000E-05 | global batch size:   336 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10670/  159576 | consumed samples:       960240 | elapsed time per iteration (ms): 17142.6 | learning rate: 6.000E-05 | global batch size:   336 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10680/  159576 | consumed samples:       963600 | elapsed time per iteration (ms): 17139.9 | learning rate: 6.000E-05 | global batch size:   336 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10690/  159576 | consumed samples:       966960 | elapsed time per iteration (ms): 17104.6 | learning rate: 6.000E-05 | global batch size:   336 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10700/  159576 | consumed samples:       970320 | elapsed time per iteration (ms): 16968.9 | learning rate: 6.000E-05 | global batch size:   336 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10710/  159576 | consumed samples:       973680 | elapsed time per iteration (ms): 17071.1 | learning rate: 6.000E-05 | global batch size:   336 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10720/  159576 | consumed samples:       977040 | elapsed time per iteration (ms): 16939.7 | learning rate: 6.000E-05 | global batch size:   336 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10730/  159576 | consumed samples:       980400 | elapsed time per iteration (ms): 17182.0 | learning rate: 6.000E-05 | global batch size:   336 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10740/  159576 | consumed samples:       983760 | elapsed time per iteration (ms): 16947.4 | learning rate: 6.000E-05 | global batch size:   336 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10750/  159576 | consumed samples:       987120 | elapsed time per iteration (ms): 16887.4 | learning rate: 6.000E-05 | global batch size:   336 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10760/  159576 | consumed samples:       990480 | elapsed time per iteration (ms): 17060.2 | learning rate: 6.000E-05 | global batch size:   336 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-27 08:14:50] PULSE: tr8-104B is running for 4:20:34 since 2021-09-27T03:54:16 (1188168 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r6i7n[7-8],r7i0n[0-5],r7i1n[7-8],r7i2n[0-1,5,8],r7i3n2,r7i5n7,r7i6n[1-4,8],r7i7n[0-4,6-8],r8i0n[0-8],r8i1n[0-4],r8i2n8,r8i3n[0-3,8],r8i4n[0-1],r8i6n[2-3,5-6],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n[0,3-8],r9i3n[0-2,6-8],r9i4n[0-6,8],r9i5n[0-8],r9i6n[0-8],r9i7n[1-8])
 iteration    10770/  159576 | consumed samples:       993920 | elapsed time per iteration (ms): 17207.0 | learning rate: 6.000E-05 | global batch size:   352 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10780/  159576 | consumed samples:       997440 | elapsed time per iteration (ms): 17439.0 | learning rate: 6.000E-05 | global batch size:   352 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10790/  159576 | consumed samples:      1000960 | elapsed time per iteration (ms): 17709.5 | learning rate: 6.000E-05 | global batch size:   352 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10800/  159576 | consumed samples:      1004480 | elapsed time per iteration (ms): 17397.4 | learning rate: 6.000E-05 | global batch size:   352 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10810/  159576 | consumed samples:      1008000 | elapsed time per iteration (ms): 17515.8 | learning rate: 6.000E-05 | global batch size:   352 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10820/  159576 | consumed samples:      1011520 | elapsed time per iteration (ms): 17500.0 | learning rate: 6.000E-05 | global batch size:   352 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10830/  159576 | consumed samples:      1015040 | elapsed time per iteration (ms): 17623.4 | learning rate: 6.000E-05 | global batch size:   352 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10840/  159576 | consumed samples:      1018560 | elapsed time per iteration (ms): 17764.6 | learning rate: 6.000E-05 | global batch size:   352 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10850/  159576 | consumed samples:      1022080 | elapsed time per iteration (ms): 17667.0 | learning rate: 6.000E-05 | global batch size:   352 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10860/  159576 | consumed samples:      1025600 | elapsed time per iteration (ms): 17590.6 | learning rate: 6.000E-05 | global batch size:   352 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10870/  159576 | consumed samples:      1029120 | elapsed time per iteration (ms): 17626.8 | learning rate: 6.000E-05 | global batch size:   352 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10880/  159576 | consumed samples:      1032640 | elapsed time per iteration (ms): 17668.3 | learning rate: 6.000E-05 | global batch size:   352 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10890/  159576 | consumed samples:      1036160 | elapsed time per iteration (ms): 17624.1 | learning rate: 6.000E-05 | global batch size:   352 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10900/  159576 | consumed samples:      1039680 | elapsed time per iteration (ms): 17793.8 | learning rate: 6.000E-05 | global batch size:   352 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10910/  159576 | consumed samples:      1043360 | elapsed time per iteration (ms): 18188.2 | learning rate: 6.000E-05 | global batch size:   368 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10920/  159576 | consumed samples:      1047040 | elapsed time per iteration (ms): 18317.3 | learning rate: 6.000E-05 | global batch size:   368 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10930/  159576 | consumed samples:      1050720 | elapsed time per iteration (ms): 18324.8 | learning rate: 6.000E-05 | global batch size:   368 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10940/  159576 | consumed samples:      1054400 | elapsed time per iteration (ms): 18321.8 | learning rate: 6.000E-05 | global batch size:   368 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10950/  159576 | consumed samples:      1058080 | elapsed time per iteration (ms): 18321.0 | learning rate: 6.000E-05 | global batch size:   368 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10960/  159576 | consumed samples:      1061760 | elapsed time per iteration (ms): 18223.5 | learning rate: 6.000E-05 | global batch size:   368 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-27 09:14:51] PULSE: tr8-104B is running for 5:20:35 since 2021-09-27T03:54:16 (1188168 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r6i7n[7-8],r7i0n[0-5],r7i1n[7-8],r7i2n[0-1,5,8],r7i3n2,r7i5n7,r7i6n[1-4,8],r7i7n[0-4,6-8],r8i0n[0-8],r8i1n[0-4],r8i2n8,r8i3n[0-3,8],r8i4n[0-1],r8i6n[2-3,5-6],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n[0,3-8],r9i3n[0-2,6-8],r9i4n[0-6,8],r9i5n[0-8],r9i6n[0-8],r9i7n[1-8])
 iteration    10970/  159576 | consumed samples:      1065440 | elapsed time per iteration (ms): 18268.5 | learning rate: 6.000E-05 | global batch size:   368 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10980/  159576 | consumed samples:      1069120 | elapsed time per iteration (ms): 18399.6 | learning rate: 6.000E-05 | global batch size:   368 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10990/  159576 | consumed samples:      1072800 | elapsed time per iteration (ms): 18217.5 | learning rate: 6.000E-05 | global batch size:   368 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11000/  159576 | consumed samples:      1076480 | elapsed time per iteration (ms): 18260.1 | learning rate: 6.000E-05 | global batch size:   368 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
-------------------------------------------------------------------------------------------------
 validation loss at iteration 11000 | lm loss value: 7.284734E+00 | lm loss PPL: 1.457873E+03 | 
-------------------------------------------------------------------------------------------------
 iteration    11010/  159576 | consumed samples:      1080160 | elapsed time per iteration (ms): 20666.6 | learning rate: 6.000E-05 | global batch size:   368 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11020/  159576 | consumed samples:      1083840 | elapsed time per iteration (ms): 18277.2 | learning rate: 6.000E-05 | global batch size:   368 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11030/  159576 | consumed samples:      1087552 | elapsed time per iteration (ms): 18419.3 | learning rate: 6.000E-05 | global batch size:   384 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11040/  159576 | consumed samples:      1091392 | elapsed time per iteration (ms): 19002.0 | learning rate: 6.000E-05 | global batch size:   384 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11050/  159576 | consumed samples:      1095232 | elapsed time per iteration (ms): 18930.9 | learning rate: 6.000E-05 | global batch size:   384 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11060/  159576 | consumed samples:      1099072 | elapsed time per iteration (ms): 18821.2 | learning rate: 6.000E-05 | global batch size:   384 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11070/  159576 | consumed samples:      1102912 | elapsed time per iteration (ms): 18889.6 | learning rate: 6.000E-05 | global batch size:   384 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11080/  159576 | consumed samples:      1106752 | elapsed time per iteration (ms): 18970.4 | learning rate: 6.000E-05 | global batch size:   384 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11090/  159576 | consumed samples:      1110592 | elapsed time per iteration (ms): 18822.6 | learning rate: 6.000E-05 | global batch size:   384 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11100/  159576 | consumed samples:      1114432 | elapsed time per iteration (ms): 18697.2 | learning rate: 6.000E-05 | global batch size:   384 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11110/  159576 | consumed samples:      1118272 | elapsed time per iteration (ms): 18737.4 | learning rate: 6.000E-05 | global batch size:   384 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11120/  159576 | consumed samples:      1122112 | elapsed time per iteration (ms): 18949.1 | learning rate: 6.000E-05 | global batch size:   384 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11130/  159576 | consumed samples:      1125952 | elapsed time per iteration (ms): 19003.8 | learning rate: 6.000E-05 | global batch size:   384 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11140/  159576 | consumed samples:      1129792 | elapsed time per iteration (ms): 18836.8 | learning rate: 6.000E-05 | global batch size:   384 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11150/  159576 | consumed samples:      1133632 | elapsed time per iteration (ms): 18941.7 | learning rate: 6.000E-05 | global batch size:   384 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11160/  159576 | consumed samples:      1137616 | elapsed time per iteration (ms): 19465.1 | learning rate: 6.000E-05 | global batch size:   400 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-27 10:14:56] PULSE: tr8-104B is running for 6:20:40 since 2021-09-27T03:54:16 (1188168 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r6i7n[7-8],r7i0n[0-5],r7i1n[7-8],r7i2n[0-1,5,8],r7i3n2,r7i5n7,r7i6n[1-4,8],r7i7n[0-4,6-8],r8i0n[0-8],r8i1n[0-4],r8i2n8,r8i3n[0-3,8],r8i4n[0-1],r8i6n[2-3,5-6],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n[0,3-8],r9i3n[0-2,6-8],r9i4n[0-6,8],r9i5n[0-8],r9i6n[0-8],r9i7n[1-8])
 iteration    11170/  159576 | consumed samples:      1141616 | elapsed time per iteration (ms): 19493.8 | learning rate: 6.000E-05 | global batch size:   400 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11180/  159576 | consumed samples:      1145616 | elapsed time per iteration (ms): 19504.7 | learning rate: 6.000E-05 | global batch size:   400 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11190/  159576 | consumed samples:      1149616 | elapsed time per iteration (ms): 19555.2 | learning rate: 6.000E-05 | global batch size:   400 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11200/  159576 | consumed samples:      1153616 | elapsed time per iteration (ms): 19490.6 | learning rate: 6.000E-05 | global batch size:   400 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11210/  159576 | consumed samples:      1157616 | elapsed time per iteration (ms): 19532.7 | learning rate: 6.000E-05 | global batch size:   400 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11220/  159576 | consumed samples:      1161616 | elapsed time per iteration (ms): 19261.8 | learning rate: 6.000E-05 | global batch size:   400 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11230/  159576 | consumed samples:      1165616 | elapsed time per iteration (ms): 19376.4 | learning rate: 6.000E-05 | global batch size:   400 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11240/  159576 | consumed samples:      1169616 | elapsed time per iteration (ms): 19505.2 | learning rate: 6.000E-05 | global batch size:   400 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11250/  159576 | consumed samples:      1173616 | elapsed time per iteration (ms): 19535.4 | learning rate: 6.000E-05 | global batch size:   400 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11260/  159576 | consumed samples:      1177616 | elapsed time per iteration (ms): 19415.2 | learning rate: 6.000E-05 | global batch size:   400 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11270/  159576 | consumed samples:      1181632 | elapsed time per iteration (ms): 19446.5 | learning rate: 6.000E-05 | global batch size:   416 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11280/  159576 | consumed samples:      1185792 | elapsed time per iteration (ms): 20068.3 | learning rate: 6.000E-05 | global batch size:   416 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11290/  159576 | consumed samples:      1189952 | elapsed time per iteration (ms): 19947.1 | learning rate: 6.000E-05 | global batch size:   416 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11300/  159576 | consumed samples:      1194112 | elapsed time per iteration (ms): 20002.0 | learning rate: 6.000E-05 | global batch size:   416 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11310/  159576 | consumed samples:      1198272 | elapsed time per iteration (ms): 20006.4 | learning rate: 6.000E-05 | global batch size:   416 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11320/  159576 | consumed samples:      1202432 | elapsed time per iteration (ms): 20000.1 | learning rate: 6.000E-05 | global batch size:   416 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11330/  159576 | consumed samples:      1206592 | elapsed time per iteration (ms): 20065.5 | learning rate: 6.000E-05 | global batch size:   416 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11340/  159576 | consumed samples:      1210752 | elapsed time per iteration (ms): 19952.9 | learning rate: 6.000E-05 | global batch size:   416 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-27 11:15:05] PULSE: tr8-104B is running for 7:20:49 since 2021-09-27T03:54:16 (1188168 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r6i7n[7-8],r7i0n[0-5],r7i1n[7-8],r7i2n[0-1,5,8],r7i3n2,r7i5n7,r7i6n[1-4,8],r7i7n[0-4,6-8],r8i0n[0-8],r8i1n[0-4],r8i2n8,r8i3n[0-3,8],r8i4n[0-1],r8i6n[2-3,5-6],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n[0,3-8],r9i3n[0-2,6-8],r9i4n[0-6,8],r9i5n[0-8],r9i6n[0-8],r9i7n[1-8])
 iteration    11350/  159576 | consumed samples:      1214912 | elapsed time per iteration (ms): 19989.1 | learning rate: 6.000E-05 | global batch size:   416 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11360/  159576 | consumed samples:      1219072 | elapsed time per iteration (ms): 19868.7 | learning rate: 6.000E-05 | global batch size:   416 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11370/  159576 | consumed samples:      1223232 | elapsed time per iteration (ms): 19987.6 | learning rate: 6.000E-05 | global batch size:   416 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11380/  159576 | consumed samples:      1227392 | elapsed time per iteration (ms): 19947.5 | learning rate: 6.000E-05 | global batch size:   416 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11390/  159576 | consumed samples:      1231664 | elapsed time per iteration (ms): 20206.1 | learning rate: 6.000E-05 | global batch size:   432 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11400/  159576 | consumed samples:      1235984 | elapsed time per iteration (ms): 20686.4 | learning rate: 6.000E-05 | global batch size:   432 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11410/  159576 | consumed samples:      1240304 | elapsed time per iteration (ms): 20763.5 | learning rate: 6.000E-05 | global batch size:   432 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11420/  159576 | consumed samples:      1244624 | elapsed time per iteration (ms): 20718.0 | learning rate: 6.000E-05 | global batch size:   432 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11430/  159576 | consumed samples:      1248944 | elapsed time per iteration (ms): 20629.3 | learning rate: 6.000E-05 | global batch size:   432 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11440/  159576 | consumed samples:      1253264 | elapsed time per iteration (ms): 20735.7 | learning rate: 6.000E-05 | global batch size:   432 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11450/  159576 | consumed samples:      1257584 | elapsed time per iteration (ms): 20551.6 | learning rate: 6.000E-05 | global batch size:   432 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11460/  159576 | consumed samples:      1261904 | elapsed time per iteration (ms): 20425.6 | learning rate: 6.000E-05 | global batch size:   432 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11470/  159576 | consumed samples:      1266224 | elapsed time per iteration (ms): 20522.3 | learning rate: 6.000E-05 | global batch size:   432 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11480/  159576 | consumed samples:      1270544 | elapsed time per iteration (ms): 20523.5 | learning rate: 6.000E-05 | global batch size:   432 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11490/  159576 | consumed samples:      1274864 | elapsed time per iteration (ms): 20644.7 | learning rate: 6.000E-05 | global batch size:   432 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11500/  159576 | consumed samples:      1279312 | elapsed time per iteration (ms): 21082.2 | learning rate: 6.000E-05 | global batch size:   448 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11510/  159576 | consumed samples:      1283792 | elapsed time per iteration (ms): 21312.4 | learning rate: 6.000E-05 | global batch size:   448 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11520/  159576 | consumed samples:      1288272 | elapsed time per iteration (ms): 21403.7 | learning rate: 6.000E-05 | global batch size:   448 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11530/  159576 | consumed samples:      1292752 | elapsed time per iteration (ms): 21133.4 | learning rate: 6.000E-05 | global batch size:   448 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11540/  159576 | consumed samples:      1297232 | elapsed time per iteration (ms): 21166.4 | learning rate: 6.000E-05 | global batch size:   448 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11550/  159576 | consumed samples:      1301712 | elapsed time per iteration (ms): 21259.6 | learning rate: 6.000E-05 | global batch size:   448 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-27 12:27:56] PULSE: tr8-104B is running for 8:33:40 since 2021-09-27T03:54:16 (1188168 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r6i7n[7-8],r7i0n[0-5],r7i1n[7-8],r7i2n[0-1,5,8],r7i3n2,r7i5n7,r7i6n[1-4,8],r7i7n[0-4,6-8],r8i0n[0-8],r8i1n[0-4],r8i2n8,r8i3n[0-3,8],r8i4n[0-1],r8i6n[2-3,5-6],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n[0,3-8],r9i3n[0-2,6-8],r9i4n[0-6,8],r9i5n[0-8],r9i6n[0-8],r9i7n[1-8])
 iteration    11560/  159576 | consumed samples:      1306192 | elapsed time per iteration (ms): 21050.1 | learning rate: 6.000E-05 | global batch size:   448 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11570/  159576 | consumed samples:      1310672 | elapsed time per iteration (ms): 21058.2 | learning rate: 6.000E-05 | global batch size:   448 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11580/  159576 | consumed samples:      1315152 | elapsed time per iteration (ms): 21057.7 | learning rate: 6.000E-05 | global batch size:   448 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11590/  159576 | consumed samples:      1319632 | elapsed time per iteration (ms): 21281.4 | learning rate: 6.000E-05 | global batch size:   448 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11600/  159576 | consumed samples:      1324144 | elapsed time per iteration (ms): 21318.5 | learning rate: 6.000E-05 | global batch size:   464 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11610/  159576 | consumed samples:      1328784 | elapsed time per iteration (ms): 21769.2 | learning rate: 6.000E-05 | global batch size:   464 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11620/  159576 | consumed samples:      1333424 | elapsed time per iteration (ms): 21656.2 | learning rate: 6.000E-05 | global batch size:   464 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11630/  159576 | consumed samples:      1338064 | elapsed time per iteration (ms): 21947.9 | learning rate: 6.000E-05 | global batch size:   464 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11640/  159576 | consumed samples:      1342704 | elapsed time per iteration (ms): 21602.8 | learning rate: 6.000E-05 | global batch size:   464 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11650/  159576 | consumed samples:      1347344 | elapsed time per iteration (ms): 21770.3 | learning rate: 6.000E-05 | global batch size:   464 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11660/  159576 | consumed samples:      1351984 | elapsed time per iteration (ms): 21697.2 | learning rate: 6.000E-05 | global batch size:   464 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11670/  159576 | consumed samples:      1356624 | elapsed time per iteration (ms): 22004.7 | learning rate: 6.000E-05 | global batch size:   464 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11680/  159576 | consumed samples:      1361264 | elapsed time per iteration (ms): 21654.6 | learning rate: 6.000E-05 | global batch size:   464 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11690/  159576 | consumed samples:      1365904 | elapsed time per iteration (ms): 21840.4 | learning rate: 6.000E-05 | global batch size:   464 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11700/  159576 | consumed samples:      1370560 | elapsed time per iteration (ms): 21982.9 | learning rate: 6.000E-05 | global batch size:   480 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11710/  159576 | consumed samples:      1375360 | elapsed time per iteration (ms): 22227.6 | learning rate: 6.000E-05 | global batch size:   480 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11720/  159576 | consumed samples:      1380160 | elapsed time per iteration (ms): 22533.1 | learning rate: 6.000E-05 | global batch size:   480 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-27 13:27:56] PULSE: tr8-104B is running for 9:33:40 since 2021-09-27T03:54:16 (1188168 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r6i7n[7-8],r7i0n[0-5],r7i1n[7-8],r7i2n[0-1,5,8],r7i3n2,r7i5n7,r7i6n[1-4,8],r7i7n[0-4,6-8],r8i0n[0-8],r8i1n[0-4],r8i2n8,r8i3n[0-3,8],r8i4n[0-1],r8i6n[2-3,5-6],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n[0,3-8],r9i3n[0-2,6-8],r9i4n[0-6,8],r9i5n[0-8],r9i6n[0-8],r9i7n[1-8])
 iteration    11730/  159576 | consumed samples:      1384960 | elapsed time per iteration (ms): 22192.1 | learning rate: 6.000E-05 | global batch size:   480 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11740/  159576 | consumed samples:      1389760 | elapsed time per iteration (ms): 22268.7 | learning rate: 6.000E-05 | global batch size:   480 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11750/  159576 | consumed samples:      1394560 | elapsed time per iteration (ms): 22268.4 | learning rate: 6.000E-05 | global batch size:   480 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11760/  159576 | consumed samples:      1399360 | elapsed time per iteration (ms): 22141.9 | learning rate: 6.000E-05 | global batch size:   480 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11770/  159576 | consumed samples:      1404160 | elapsed time per iteration (ms): 21979.0 | learning rate: 6.000E-05 | global batch size:   480 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11780/  159576 | consumed samples:      1408960 | elapsed time per iteration (ms): 22172.2 | learning rate: 6.000E-05 | global batch size:   480 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11790/  159576 | consumed samples:      1413760 | elapsed time per iteration (ms): 22335.9 | learning rate: 6.000E-05 | global batch size:   480 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11800/  159576 | consumed samples:      1418592 | elapsed time per iteration (ms): 22588.3 | learning rate: 6.000E-05 | global batch size:   496 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11810/  159576 | consumed samples:      1423552 | elapsed time per iteration (ms): 22823.4 | learning rate: 6.000E-05 | global batch size:   496 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11820/  159576 | consumed samples:      1428512 | elapsed time per iteration (ms): 22959.2 | learning rate: 6.000E-05 | global batch size:   496 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11830/  159576 | consumed samples:      1433472 | elapsed time per iteration (ms): 23080.3 | learning rate: 6.000E-05 | global batch size:   496 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11840/  159576 | consumed samples:      1438432 | elapsed time per iteration (ms): 23034.0 | learning rate: 6.000E-05 | global batch size:   496 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11850/  159576 | consumed samples:      1443392 | elapsed time per iteration (ms): 23099.6 | learning rate: 6.000E-05 | global batch size:   496 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11860/  159576 | consumed samples:      1448352 | elapsed time per iteration (ms): 23031.2 | learning rate: 6.000E-05 | global batch size:   496 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11870/  159576 | consumed samples:      1453312 | elapsed time per iteration (ms): 22866.8 | learning rate: 6.000E-05 | global batch size:   496 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11880/  159576 | consumed samples:      1458272 | elapsed time per iteration (ms): 23007.5 | learning rate: 6.000E-05 | global batch size:   496 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-27 14:27:59] PULSE: tr8-104B is running for 10:33:43 since 2021-09-27T03:54:16 (1188168 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r6i7n[7-8],r7i0n[0-5],r7i1n[7-8],r7i2n[0-1,5,8],r7i3n2,r7i5n7,r7i6n[1-4,8],r7i7n[0-4,6-8],r8i0n[0-8],r8i1n[0-4],r8i2n8,r8i3n[0-3,8],r8i4n[0-1],r8i6n[2-3,5-6],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n[0,3-8],r9i3n[0-2,6-8],r9i4n[0-6,8],r9i5n[0-8],r9i6n[0-8],r9i7n[1-8])
 iteration    11890/  159576 | consumed samples:      1463232 | elapsed time per iteration (ms): 23034.3 | learning rate: 6.000E-05 | global batch size:   496 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11900/  159576 | consumed samples:      1468304 | elapsed time per iteration (ms): 23486.5 | learning rate: 6.000E-05 | global batch size:   512 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11910/  159576 | consumed samples:      1473424 | elapsed time per iteration (ms): 23540.7 | learning rate: 6.000E-05 | global batch size:   512 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11920/  159576 | consumed samples:      1478544 | elapsed time per iteration (ms): 23676.0 | learning rate: 6.000E-05 | global batch size:   512 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11930/  159576 | consumed samples:      1483664 | elapsed time per iteration (ms): 23529.7 | learning rate: 6.000E-05 | global batch size:   512 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11940/  159576 | consumed samples:      1488784 | elapsed time per iteration (ms): 23604.1 | learning rate: 6.000E-05 | global batch size:   512 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11950/  159576 | consumed samples:      1493904 | elapsed time per iteration (ms): 23627.0 | learning rate: 6.000E-05 | global batch size:   512 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11960/  159576 | consumed samples:      1499024 | elapsed time per iteration (ms): 23559.5 | learning rate: 6.000E-05 | global batch size:   512 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11970/  159576 | consumed samples:      1504144 | elapsed time per iteration (ms): 23611.0 | learning rate: 6.000E-05 | global batch size:   512 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11980/  159576 | consumed samples:      1509264 | elapsed time per iteration (ms): 23634.8 | learning rate: 6.000E-05 | global batch size:   512 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11990/  159576 | consumed samples:      1514464 | elapsed time per iteration (ms): 23596.0 | learning rate: 6.000E-05 | global batch size:   528 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-27 15:14:45,510] [INFO] [logging.py:68:log_dist] [Rank 0] step=12000, skipped=3052, lr=[5.999919375575235e-05, 5.999919375575235e-05], mom=[(0.9, 0.999), (0.9, 0.999)]
steps: 12000 loss: nan iter time (s): 0.012 samples/sec: 43274.454
 iteration    12000/  159576 | consumed samples:      1519744 | elapsed time per iteration (ms): 24091.4 | learning rate: 6.000E-05 | global batch size:   528 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
-------------------------------------------------------------------------------------------------
 validation loss at iteration 12000 | lm loss value: 7.282808E+00 | lm loss PPL: 1.455068E+03 | 
-------------------------------------------------------------------------------------------------
saving checkpoint at iteration   12000 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints
[2021-09-27 15:15:22,225] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/global_step12000/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration   12000 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints
time (ms) | save-checkpoint: 32585.61
 iteration    12010/  159576 | consumed samples:      1525024 | elapsed time per iteration (ms): 30246.8 | learning rate: 6.000E-05 | global batch size:   528 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    12020/  159576 | consumed samples:      1530304 | elapsed time per iteration (ms): 24139.3 | learning rate: 6.000E-05 | global batch size:   528 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    12030/  159576 | consumed samples:      1535584 | elapsed time per iteration (ms): 24280.0 | learning rate: 6.000E-05 | global batch size:   528 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-27 15:28:02] PULSE: tr8-104B is running for 11:33:46 since 2021-09-27T03:54:16 (1188168 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r6i7n[7-8],r7i0n[0-5],r7i1n[7-8],r7i2n[0-1,5,8],r7i3n2,r7i5n7,r7i6n[1-4,8],r7i7n[0-4,6-8],r8i0n[0-8],r8i1n[0-4],r8i2n8,r8i3n[0-3,8],r8i4n[0-1],r8i6n[2-3,5-6],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n[0,3-8],r9i3n[0-2,6-8],r9i4n[0-6,8],r9i5n[0-8],r9i6n[0-8],r9i7n[1-8])
 iteration    12040/  159576 | consumed samples:      1540864 | elapsed time per iteration (ms): 23963.9 | learning rate: 6.000E-05 | global batch size:   528 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    12050/  159576 | consumed samples:      1546144 | elapsed time per iteration (ms): 24135.8 | learning rate: 6.000E-05 | global batch size:   528 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    12060/  159576 | consumed samples:      1551424 | elapsed time per iteration (ms): 24044.3 | learning rate: 6.000E-05 | global batch size:   528 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    12070/  159576 | consumed samples:      1556704 | elapsed time per iteration (ms): 24087.4 | learning rate: 6.000E-05 | global batch size:   528 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    12080/  159576 | consumed samples:      1562064 | elapsed time per iteration (ms): 24400.0 | learning rate: 6.000E-05 | global batch size:   544 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    12090/  159576 | consumed samples:      1567504 | elapsed time per iteration (ms): 24552.7 | learning rate: 6.000E-05 | global batch size:   544 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    12100/  159576 | consumed samples:      1572944 | elapsed time per iteration (ms): 24886.7 | learning rate: 6.000E-05 | global batch size:   544 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    12110/  159576 | consumed samples:      1578384 | elapsed time per iteration (ms): 24781.4 | learning rate: 6.000E-05 | global batch size:   544 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    12120/  159576 | consumed samples:      1583824 | elapsed time per iteration (ms): 24493.1 | learning rate: 6.000E-05 | global batch size:   544 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    12130/  159576 | consumed samples:      1589264 | elapsed time per iteration (ms): 24851.3 | learning rate: 6.000E-05 | global batch size:   544 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    12140/  159576 | consumed samples:      1594704 | elapsed time per iteration (ms): 24746.4 | learning rate: 6.000E-05 | global batch size:   544 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    12150/  159576 | consumed samples:      1600144 | elapsed time per iteration (ms): 24578.3 | learning rate: 6.000E-05 | global batch size:   544 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    12160/  159576 | consumed samples:      1605584 | elapsed time per iteration (ms): 24469.2 | learning rate: 6.000E-05 | global batch size:   544 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    12170/  159576 | consumed samples:      1611152 | elapsed time per iteration (ms): 24994.1 | learning rate: 6.000E-05 | global batch size:   560 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-27 16:28:40] PULSE: tr8-104B is running for 12:34:24 since 2021-09-27T03:54:16 (1188168 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r6i7n[7-8],r7i0n[0-5],r7i1n[7-8],r7i2n[0-1,5,8],r7i3n2,r7i5n7,r7i6n[1-4,8],r7i7n[0-4,6-8],r8i0n[0-8],r8i1n[0-4],r8i2n8,r8i3n[0-3,8],r8i4n[0-1],r8i6n[2-3,5-6],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n[0,3-8],r9i3n[0-2,6-8],r9i4n[0-6,8],r9i5n[0-8],r9i6n[0-8],r9i7n[1-8])
 iteration    12180/  159576 | consumed samples:      1616752 | elapsed time per iteration (ms): 25275.1 | learning rate: 6.000E-05 | global batch size:   560 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    12190/  159576 | consumed samples:      1622352 | elapsed time per iteration (ms): 25176.8 | learning rate: 6.000E-05 | global batch size:   560 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    12200/  159576 | consumed samples:      1627952 | elapsed time per iteration (ms): 25167.8 | learning rate: 6.000E-05 | global batch size:   560 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    12210/  159576 | consumed samples:      1633552 | elapsed time per iteration (ms): 25057.7 | learning rate: 6.000E-05 | global batch size:   560 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    12220/  159576 | consumed samples:      1639152 | elapsed time per iteration (ms): 25147.4 | learning rate: 6.000E-05 | global batch size:   560 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    12230/  159576 | consumed samples:      1644752 | elapsed time per iteration (ms): 25198.7 | learning rate: 6.000E-05 | global batch size:   560 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    12240/  159576 | consumed samples:      1650352 | elapsed time per iteration (ms): 24894.2 | learning rate: 6.000E-05 | global batch size:   560 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    12250/  159576 | consumed samples:      1656016 | elapsed time per iteration (ms): 25306.4 | learning rate: 6.000E-05 | global batch size:   576 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    12260/  159576 | consumed samples:      1661776 | elapsed time per iteration (ms): 25946.7 | learning rate: 6.000E-05 | global batch size:   576 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    12270/  159576 | consumed samples:      1667536 | elapsed time per iteration (ms): 25714.3 | learning rate: 6.000E-05 | global batch size:   576 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    12280/  159576 | consumed samples:      1673296 | elapsed time per iteration (ms): 25863.6 | learning rate: 6.000E-05 | global batch size:   576 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    12290/  159576 | consumed samples:      1679056 | elapsed time per iteration (ms): 26038.1 | learning rate: 6.000E-05 | global batch size:   576 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    12300/  159576 | consumed samples:      1684816 | elapsed time per iteration (ms): 25611.4 | learning rate: 6.000E-05 | global batch size:   576 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    12310/  159576 | consumed samples:      1690576 | elapsed time per iteration (ms): 25819.3 | learning rate: 6.000E-05 | global batch size:   576 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-27 17:28:18] PULSE: tr8-104B is running for 13:34:02 since 2021-09-27T03:54:16 (1188168 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r6i7n[7-8],r7i0n[0-5],r7i1n[7-8],r7i2n[0-1,5,8],r7i3n2,r7i5n7,r7i6n[1-4,8],r7i7n[0-4,6-8],r8i0n[0-8],r8i1n[0-4],r8i2n8,r8i3n[0-3,8],r8i4n[0-1],r8i6n[2-3,5-6],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n[0,3-8],r9i3n[0-2,6-8],r9i4n[0-6,8],r9i5n[0-8],r9i6n[0-8],r9i7n[1-8])
 iteration    12320/  159576 | consumed samples:      1696336 | elapsed time per iteration (ms): 25983.5 | learning rate: 6.000E-05 | global batch size:   576 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    12330/  159576 | consumed samples:      1702128 | elapsed time per iteration (ms): 25674.0 | learning rate: 6.000E-05 | global batch size:   592 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    12340/  159576 | consumed samples:      1708048 | elapsed time per iteration (ms): 26437.1 | learning rate: 6.000E-05 | global batch size:   592 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
Killing subprocess 76100
Killing subprocess 76101
Killing subprocess 76102
Killing subprocess 76103
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in <module>
    main()
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main
    sigkill_handler(signal.SIGTERM, None)  # not coming back
  File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler
    raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd)
subprocess.CalledProcessError: Command '['/gpfswork/rech/six/commun/conda/tr1-13B/bin/python', '-u', '/gpfswork/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/pretrain_gpt.py', '--local_rank=3', '--tensor-model-parallel-size', '4', '--pipeline-model-parallel-size', '8', '--num-layers', '32', '--hidden-size', '16384', '--ffn-hidden-size', '20480', '--num-attention-heads', '32', '--seq-length', '2048', '--max-position-embeddings', '2048', '--micro-batch-size', '1', '--rampup-batch-size', '16', '16', '6_000_000', '--global-batch-size', '2048', '--train-samples', '300_000_000', '--vocab-file', '/gpfswork/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/data/gpt2-vocab.json', '--merge-file', '/gpfswork/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/data/gpt2-merges.txt', '--loss-scale', '12', '--clip-grad', '1.0', '--fp16', '--checkpoint-activations', '--seed', '42', '--optimizer', 'adam', '--adam-beta1', '0.9', '--adam-beta2', '0.999', '--adam-eps', '1e-8', '--lr', '6e-5', '--min-lr', '6e-6', '--lr-decay-style', 'cosine', '--lr-decay-samples', '126_953_125', '--lr-warmup-samples', '216_320', '--clip-grad', '1.0', '--weight-decay', '1e-1', '--exit-duration-in-mins', '1190', '--log-interval', '10', '--save-interval', '1500', '--eval-interval', '1000', '--eval-iters', '5', '--codecarbon-dir', '/gpfsscratch/rech/six/commun/checkpoints/tr8-104B/tr8-104B-logs/codecarbon', '--tensorboard-dir', '/gpfsscratch/rech/six/commun/checkpoints/tr8-104B/tr8-104B-logs/tensorboard', '--tensorboard-queue-size', '5', '--log-timers-to-tensorboard', '--log-batch-size-to-tensorboard', '--log-validation-ppl-to-tensorboard', '--save', '/gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints', '--load', '/gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints', '--data-path', '/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document', '--data-impl', 'mmap', '--split', '949,50,1', '--distributed-backend', 'nccl', '--deepspeed', '--deepspeed_config', './ds_config.1188168.json', '--zero-stage', '1', '--deepspeed-activation-checkpointing']' died with <Signals.SIGTERM: 15>.
srun: error: r6i5n7: task 0: Exited with exit code 1
srun: Terminating job step 1188168.0
Killing subprocess 59848
Killing subprocess 59849
Killing subprocess 59850
Killing subprocess 69437
Killing subprocess 59851
Killing subprocess 3750
Killing subprocess 69438
Killing subprocess 23911
Killing subprocess 36274
Killing subprocess 12887
Killing subprocess 64701
Killing subprocess 46448
Killing subprocess 37626
Killing subprocess 69439
Killing subprocess 12566
Killing subprocess 45975
Killing subprocess 59577
Killing subprocess 3751
Killing subprocess 69440
Killing subprocess 20638
Killing subprocess 12618
Killing subprocess 63737
Killing subprocess 12888
Killing subprocess 24910
Killing subprocess 77610
Killing subprocess 3752
Killing subprocess 65070
Killing subprocess 64702
Killing subprocess 46449
Killing subprocess 3710
Killing subprocess 36275
Killing subprocess 59578
Killing subprocess 64317
Killing subprocess 37627
Killing subprocess 23912
Killing subprocess 54693
Killing subprocess 76941
Killing subprocess 20639
Killing subprocess 74689
Killing subprocess 65692
Killing subprocess 12619
Killing subprocess 12567
Killing subprocess 63738
Killing subprocess 19395
Killing subprocess 44152
Killing subprocess 35247
Killing subprocess 14362
Killing subprocess 77611
Killing subprocess 59276
Killing subprocess 59579
Main process received SIGTERM, exiting
Killing subprocess 37628
Killing subprocess 3753
Killing subprocess 65071
Main process received SIGTERM, exiting
Killing subprocess 23913
Killing subprocess 54694
Killing subprocess 64703
Killing subprocess 12568
Killing subprocess 63739
Killing subprocess 46450
Killing subprocess 45976
Killing subprocess 3711
Killing subprocess 38195
Killing subprocess 36276
Killing subprocess 12889
Killing subprocess 24911
Killing subprocess 10979
Killing subprocess 77612
Killing subprocess 59580
Killing subprocess 18302
Killing subprocess 63373
Killing subprocess 64318
Killing subprocess 37630
Killing subprocess 65072
Killing subprocess 52483
Killing subprocess 23914
Killing subprocess 54695
Killing subprocess 68328
Killing subprocess 76942
Killing subprocess 20640
Killing subprocess 74690
Killing subprocess 65693
Killing subprocess 64705
Killing subprocess 12620
Killing subprocess 12569
Killing subprocess 63740
Killing subprocess 46451
Killing subprocess 45977
Killing subprocess 55848
Killing subprocess 3712
Killing subprocess 19396
Killing subprocess 44153
Killing subprocess 35248
Killing subprocess 47024
Killing subprocess 33695
Killing subprocess 36277
Killing subprocess 12891
Killing subprocess 63460
Killing subprocess 14363
Killing subprocess 57783
Killing subprocess 24912
Killing subprocess 10980
Killing subprocess 77613
Killing subprocess 59277
Killing subprocess 69993
Killing subprocess 53038
Killing subprocess 18303
Killing subprocess 63374
Killing subprocess 64319
Killing subprocess 8034
Killing subprocess 62238
Main process received SIGTERM, exiting
Killing subprocess 53475
Killing subprocess 65073
Killing subprocess 52484
Killing subprocess 54696
Killing subprocess 68329
Killing subprocess 76943
Killing subprocess 20641
Killing subprocess 74691
Killing subprocess 65694
Killing subprocess 43049
Killing subprocess 12621
Main process received SIGTERM, exiting
Main process received SIGTERM, exiting
Killing subprocess 45978
Killing subprocess 55849
Killing subprocess 3713
Killing subprocess 39768
Killing subprocess 19397
Killing subprocess 44154
Killing subprocess 35249
Killing subprocess 47025
Killing subprocess 71483
Killing subprocess 33696
Killing subprocess 38196
Main process received SIGTERM, exiting
Main process received SIGTERM, exiting
Killing subprocess 63461
Killing subprocess 14364
Killing subprocess 57784
Killing subprocess 24913
Main process received SIGTERM, exiting
Killing subprocess 59278
Killing subprocess 70408
Killing subprocess 69994
Killing subprocess 2853
Killing subprocess 53039
Killing subprocess 18304
Killing subprocess 52628
Killing subprocess 63375
Killing subprocess 64320
Killing subprocess 77051
Killing subprocess 41073
Killing subprocess 8035
Killing subprocess 3968
Killing subprocess 23148
Killing subprocess 67068
Main process received SIGTERM, exiting
Killing subprocess 81189
Killing subprocess 62239
Killing subprocess 53476
Killing subprocess 69086
Killing subprocess 52485
Main process received SIGTERM, exiting
Killing subprocess 62883
Killing subprocess 65551
Killing subprocess 68330
Killing subprocess 76945
Main process received SIGTERM, exiting
Killing subprocess 75336
Killing subprocess 15286
Killing subprocess 74692
Killing subprocess 65695
Killing subprocess 43050
Main process received SIGTERM, exiting
Killing subprocess 66988
Main process received SIGTERM, exiting
Main process received SIGTERM, exiting
Killing subprocess 55850
Killing subprocess 42101
Main process received SIGTERM, exiting
Killing subprocess 8608
Killing subprocess 39769
Killing subprocess 19398
Killing subprocess 44155
Killing subprocess 15244
Killing subprocess 50869
Killing subprocess 35250
Killing subprocess 47026
Killing subprocess 71484
Killing subprocess 35789
Killing subprocess 56590
Killing subprocess 33697
Killing subprocess 38197
Killing subprocess 21496
Killing subprocess 63462
Killing subprocess 81499
Killing subprocess 14365
Killing subprocess 57785
Main process received SIGTERM, exiting
Killing subprocess 10981
Killing subprocess 59279
Killing subprocess 37333
Main process received SIGTERM, exiting
Killing subprocess 48823
Killing subprocess 70409
Killing subprocess 69995
Killing subprocess 2854
Killing subprocess 53040
Killing subprocess 18305
Killing subprocess 52629
Killing subprocess 63376
Main process received SIGTERM, exiting
Killing subprocess 77052
Killing subprocess 41074
Killing subprocess 8036
Killing subprocess 39465
Killing subprocess 39466
Killing subprocess 39467
Killing subprocess 79012
Killing subprocess 3969
Killing subprocess 23149
Killing subprocess 67069
Killing subprocess 81190
Killing subprocess 56744
Killing subprocess 66319
Killing subprocess 62240
Killing subprocess 53477
Killing subprocess 25176
Killing subprocess 69087
Main process received SIGTERM, exiting
Killing subprocess 52486
Killing subprocess 23707
Killing subprocess 62884
Main process received SIGTERM, exiting
Killing subprocess 65552
Killing subprocess 68331
Killing subprocess 10802
Main process received SIGTERM, exiting
Killing subprocess 37596
Killing subprocess 75337
Killing subprocess 15287
Main process received SIGTERM, exiting
Main process received SIGTERM, exiting
Main process received SIGTERM, exiting
Killing subprocess 43051
Killing subprocess 12337
Killing subprocess 66989
Killing subprocess 50840
Killing subprocess 55851
Killing subprocess 42102
Killing subprocess 77529
Killing subprocess 13528
Killing subprocess 8609
Killing subprocess 14216
Killing subprocess 39770
Main process received SIGTERM, exiting
Main process received SIGTERM, exiting
Killing subprocess 15245
Killing subprocess 50870
Main process received SIGTERM, exiting
Killing subprocess 47027
Killing subprocess 79944
Killing subprocess 71485
Killing subprocess 9027
Killing subprocess 35790
Killing subprocess 56591
Killing subprocess 33699
Killing subprocess 38198
Killing subprocess 37572
Killing subprocess 21497
Killing subprocess 63463
Killing subprocess 81500
Main process received SIGTERM, exiting
Killing subprocess 57787
Killing subprocess 41379
Killing subprocess 10982
Main process received SIGTERM, exiting
Killing subprocess 37334
Killing subprocess 48824
Killing subprocess 38560
Killing subprocess 41538
Killing subprocess 70410
Killing subprocess 69997
Killing subprocess 55623
Killing subprocess 2855
Killing subprocess 53042
Main process received SIGTERM, exiting
Killing subprocess 52630
Main process received SIGTERM, exiting
Killing subprocess 77053
Killing subprocess 41075
Killing subprocess 76949
Killing subprocess 8037
Killing subprocess 39468
Main process received SIGTERM, exiting
Killing subprocess 79013
Killing subprocess 3970
Killing subprocess 23150
Killing subprocess 67070
Killing subprocess 2742
Killing subprocess 81191
Killing subprocess 47225
Killing subprocess 56745
Killing subprocess 66320
Killing subprocess 62241
Killing subprocess 54272
Killing subprocess 53478
Killing subprocess 25177
Killing subprocess 69088
Main process received SIGTERM, exiting
Killing subprocess 23708
Killing subprocess 62885
Killing subprocess 79197
Killing subprocess 65553
Main process received SIGTERM, exiting
Killing subprocess 10803
Killing subprocess 37597
Killing subprocess 75338
Killing subprocess 15288
Killing subprocess 43052
Killing subprocess 12338
Killing subprocess 14353
Killing subprocess 66990
Killing subprocess 50841
Killing subprocess 75513
Main process received SIGTERM, exiting
Killing subprocess 42103
Killing subprocess 77530
Killing subprocess 13529
Killing subprocess 8610
Killing subprocess 14217
Killing subprocess 39772
Killing subprocess 15246
Killing subprocess 50871
Killing subprocess 52998
Killing subprocess 75590
Main process received SIGTERM, exiting
Killing subprocess 79945
Killing subprocess 71487
Killing subprocess 9028
Killing subprocess 35791
Killing subprocess 56592
Main process received SIGTERM, exiting
Main process received SIGTERM, exiting
Killing subprocess 37573
Killing subprocess 21498
Main process received SIGTERM, exiting
Killing subprocess 81501
Main process received SIGTERM, exiting
Killing subprocess 41380
Main process received SIGTERM, exiting
Killing subprocess 37335
Killing subprocess 48825
Killing subprocess 38561
Killing subprocess 41539
Killing subprocess 70411
Main process received SIGTERM, exiting
Killing subprocess 55624
Killing subprocess 69208
Killing subprocess 2856
Main process received SIGTERM, exiting
Killing subprocess 52631
Killing subprocess 35916
Killing subprocess 4836
Killing subprocess 77055
Killing subprocess 41076
Killing subprocess 76950
Main process received SIGTERM, exiting
Killing subprocess 47505
Killing subprocess 79014
Killing subprocess 3971
Killing subprocess 23151
Killing subprocess 67071
Killing subprocess 34883
Killing subprocess 2743
Killing subprocess 81192
Killing subprocess 47226
Killing subprocess 56746
Killing subprocess 17937
Killing subprocess 66321
Main process received SIGTERM, exiting
Killing subprocess 54273
Main process received SIGTERM, exiting
Killing subprocess 25178
Killing subprocess 69089
Killing subprocess 23709
Killing subprocess 62886
Killing subprocess 79198
Killing subprocess 65554
Killing subprocess 67154
Killing subprocess 10804
Killing subprocess 37598
Killing subprocess 75339
Killing subprocess 15289
Main process received SIGTERM, exiting
Killing subprocess 12339
Killing subprocess 14354
Killing subprocess 66992
Killing subprocess 50842
Killing subprocess 39827
Killing subprocess 75514
Killing subprocess 42105
Killing subprocess 77531
Killing subprocess 53851
Killing subprocess 13530
Killing subprocess 8611
Killing subprocess 14218
Main process received SIGTERM, exiting
Killing subprocess 15247
Killing subprocess 50872
Killing subprocess 52999
Killing subprocess 75591
Killing subprocess 44143
Killing subprocess 79946
Main process received SIGTERM, exiting
Killing subprocess 9029
Killing subprocess 35792
Killing subprocess 56593
Killing subprocess 37574
Killing subprocess 57528
Killing subprocess 21499
Killing subprocess 81502
Killing subprocess 41381
Killing subprocess 37336
Killing subprocess 48826
Killing subprocess 16969
Killing subprocess 38562
Killing subprocess 41540
Main process received SIGTERM, exiting
Killing subprocess 55625
Killing subprocess 69209
Main process received SIGTERM, exiting
Main process received SIGTERM, exiting
Killing subprocess 35917
Killing subprocess 4837
Main process received SIGTERM, exiting
Main process received SIGTERM, exiting
Killing subprocess 76951
Killing subprocess 47506
Killing subprocess 79015
Main process received SIGTERM, exiting
Main process received SIGTERM, exiting
Main process received SIGTERM, exiting
Killing subprocess 34884
Killing subprocess 2744
Main process received SIGTERM, exiting
Killing subprocess 47227
Killing subprocess 56747
Killing subprocess 17938
Killing subprocess 66322
Killing subprocess 45571
Killing subprocess 54274
Killing subprocess 25179
Main process received SIGTERM, exiting
Killing subprocess 23711
Main process received SIGTERM, exiting
Killing subprocess 79199
Main process received SIGTERM, exiting
Killing subprocess 67155
Killing subprocess 10805
Killing subprocess 37599
Main process received SIGTERM, exiting
Main process received SIGTERM, exiting
Killing subprocess 12340
Killing subprocess 14355
Main process received SIGTERM, exiting
Killing subprocess 50844
Killing subprocess 39828
Killing subprocess 75515
Main process received SIGTERM, exiting
Killing subprocess 7953
Killing subprocess 77532
Killing subprocess 53852
Killing subprocess 13531
Main process received SIGTERM, exiting
Killing subprocess 14219
Main process received SIGTERM, exiting
Main process received SIGTERM, exiting
Killing subprocess 53000
Killing subprocess 75592
Killing subprocess 44144
Killing subprocess 79947
Killing subprocess 9030
Main process received SIGTERM, exiting
Main process received SIGTERM, exiting
Killing subprocess 37575
Killing subprocess 57529
Main process received SIGTERM, exiting
Main process received SIGTERM, exiting
Killing subprocess 41383
Main process received SIGTERM, exiting
Main process received SIGTERM, exiting
Killing subprocess 16970
Killing subprocess 38563
Killing subprocess 41541
Killing subprocess 55626
Killing subprocess 69210
Killing subprocess 35918
Killing subprocess 4838
Killing subprocess 76953
Killing subprocess 47507
Main process received SIGTERM, exiting
Killing subprocess 34885
Killing subprocess 2745
Killing subprocess 47228
Main process received SIGTERM, exiting
Killing subprocess 17939
Main process received SIGTERM, exiting
Killing subprocess 45572
Killing subprocess 54275
Main process received SIGTERM, exiting
Killing subprocess 34811
Main process received SIGTERM, exiting
Killing subprocess 79200
Killing subprocess 67156
Main process received SIGTERM, exiting
Main process received SIGTERM, exiting
Main process received SIGTERM, exiting
Killing subprocess 14357
Main process received SIGTERM, exiting
Killing subprocess 39829
Killing subprocess 75516
Killing subprocess 7954
Main process received SIGTERM, exiting
Killing subprocess 53853
Main process received SIGTERM, exiting
Main process received SIGTERM, exiting
Killing subprocess 53002
Killing subprocess 75593
Killing subprocess 44145
Main process received SIGTERM, exiting
Main process received SIGTERM, exiting
Main process received SIGTERM, exiting
Killing subprocess 57530
Main process received SIGTERM, exiting
Killing subprocess 16971
Main process received SIGTERM, exiting
Main process received SIGTERM, exiting
Main process received SIGTERM, exiting
Killing subprocess 69211
Killing subprocess 35919
Killing subprocess 4839
Main process received SIGTERM, exiting
Killing subprocess 47509
Killing subprocess 34886
Main process received SIGTERM, exiting
Main process received SIGTERM, exiting
Killing subprocess 17940
Killing subprocess 45573
Main process received SIGTERM, exiting
Killing subprocess 34812
Main process received SIGTERM, exiting
Killing subprocess 67157
Main process received SIGTERM, exiting
Killing subprocess 39830
Main process received SIGTERM, exiting
Killing subprocess 7955
Killing subprocess 53854
Main process received SIGTERM, exiting
Main process received SIGTERM, exiting
Killing subprocess 44147
Killing subprocess 57531
Killing subprocess 16972
Main process received SIGTERM, exiting
Main process received SIGTERM, exiting
Main process received SIGTERM, exiting
Main process received SIGTERM, exiting
Main process received SIGTERM, exiting
Main process received SIGTERM, exiting
Killing subprocess 45575
Killing subprocess 34813
Main process received SIGTERM, exiting
Main process received SIGTERM, exiting
Main process received SIGTERM, exiting
Main process received SIGTERM, exiting
Main process received SIGTERM, exiting
Main process received SIGTERM, exiting
Main process received SIGTERM, exiting
Killing subprocess 34814
Main process received SIGTERM, exiting
Killing subprocess 7956
Main process received SIGTERM, exiting
Killing subprocess 42690
Killing subprocess 42691
Killing subprocess 42692
Killing subprocess 42693
Main process received SIGTERM, exiting
Killing subprocess 7083
Killing subprocess 7084
Killing subprocess 7085
Killing subprocess 22811
Killing subprocess 7086
Killing subprocess 22812
Main process received SIGTERM, exiting
Killing subprocess 22813
Killing subprocess 22814
Main process received SIGTERM, exiting
Killing subprocess 13431
Killing subprocess 13432
Killing subprocess 13433
Killing subprocess 13434
Main process received SIGTERM, exiting
Killing subprocess 72295
Killing subprocess 72296
Killing subprocess 72297
Killing subprocess 15401
Killing subprocess 72298
Killing subprocess 15402
Killing subprocess 15403
Killing subprocess 15405
Killing subprocess 52149
Main process received SIGTERM, exiting
Main process received SIGTERM, exiting
Killing subprocess 52150
Killing subprocess 52151
Killing subprocess 52152
Main process received SIGTERM, exiting
Killing subprocess 38674
Killing subprocess 38675
Killing subprocess 33953
Killing subprocess 33954
Killing subprocess 38676
Killing subprocess 38677
Killing subprocess 33955
Main process received SIGTERM, exiting
Killing subprocess 33957
Main process received SIGTERM, exiting
Killing subprocess 65236
Killing subprocess 65237
Killing subprocess 65238
Killing subprocess 65239
Main process received SIGTERM, exiting
srun: error: r8i1n2: task 43: Exited with exit code 1
srun: error: r9i5n7: task 109: Exited with exit code 1
srun: error: r9i6n0: task 111: Exited with exit code 1
srun: error: r9i0n1: task 65: Exited with exit code 1
srun: error: r9i0n3: task 67: Exited with exit code 1
srun: error: r8i0n2: task 34: Exited with exit code 1
srun: error: r7i6n2: task 20: Exited with exit code 1
srun: error: r9i2n8: task 87: Exited with exit code 1
srun: error: r8i0n8: task 40: Exited with exit code 1
srun: error: r9i3n1: task 89: Exited with exit code 1
srun: error: r9i4n1: task 95: Exited with exit code 1
srun: error: r9i3n0: task 88: Exited with exit code 1
srun: error: r6i6n0: task 2: Exited with exit code 1
srun: error: r8i3n2: task 49: Exited with exit code 1
srun: error: r8i0n7: task 39: Exited with exit code 1
srun: error: r9i6n7: task 118: Exited with exit code 1
srun: error: r8i7n6: task 61: Exited with exit code 1
srun: error: r8i7n4: task 59: Exited with exit code 1
srun: error: r9i5n4: task 106: Exited with exit code 1
srun: error: r8i0n6: task 38: Exited with exit code 1
srun: error: r9i5n8: task 110: Exited with exit code 1
srun: error: r9i0n4: task 68: Exited with exit code 1
srun: error: r9i4n3: task 97: Exited with exit code 1
srun: error: r8i1n0: task 41: Exited with exit code 1
srun: error: r7i7n7: task 30: Exited with exit code 1
srun: error: r9i2n3: task 82: Exited with exit code 1
srun: error: r9i6n8: task 119: Exited with exit code 1
srun: error: r8i7n7: task 62: Exited with exit code 1
srun: error: r9i5n5: task 107: Exited with exit code 1
srun: error: r9i2n6: task 85: Exited with exit code 1
srun: error: r7i6n4: task 22: Exited with exit code 1
srun: error: r9i1n2: task 74: Exited with exit code 1
srun: error: r9i0n0: task 64: Exited with exit code 1
srun: error: r9i0n5: task 69: Exited with exit code 1
srun: error: r8i2n8: task 46: Exited with exit code 1
srun: error: r9i4n2: task 96: Exited with exit code 1
srun: error: r7i3n2: task 17: Exited with exit code 1
srun: error: r9i3n7: task 92: Exited with exit code 1
srun: error: r9i0n2: task 66: Exited with exit code 1
srun: error: r9i1n3: task 75: Exited with exit code 1
srun: error: r8i1n4: task 45: Exited with exit code 1
srun: error: r8i7n5: task 60: Exited with exit code 1
srun: error: r9i2n5: task 84: Exited with exit code 1
srun: error: r7i7n8: task 31: Exited with exit code 1
srun: error: r8i0n5: task 37: Exited with exit code 1
srun: error: r8i7n3: task 58: Exited with exit code 1
srun: error: r7i6n3: task 21: Exited with exit code 1
srun: error: r9i1n1: task 73: Exited with exit code 1
srun: error: r9i3n8: task 93: Exited with exit code 1
srun: error: r8i7n8: task 63: Exited with exit code 1
srun: error: r8i3n0: task 47: Exited with exit code 1
srun: error: r8i0n3: task 35: Exited with exit code 1
srun: error: r9i4n0: task 94: Exited with exit code 1
srun: error: r9i5n3: task 105: Exited with exit code 1
srun: error: r8i1n3: task 44: Exited with exit code 1
srun: error: r8i6n6: task 57: Exited with exit code 1
srun: error: r8i0n0: task 32: Exited with exit code 1
srun: error: r9i5n6: task 108: Exited with exit code 1
srun: error: r9i2n4: task 83: Exited with exit code 1
srun: error: r8i3n1: task 48: Exited with exit code 1
srun: error: r7i2n5: task 15: Exited with exit code 1
srun: error: r9i1n0: task 72: Exited with exit code 1
srun: error: r7i5n7: task 18: Exited with exit code 1
srun: error: r6i5n8: task 1: Exited with exit code 1
srun: error: r8i3n8: task 51: Exited with exit code 1
srun: error: r8i0n4: task 36: Exited with exit code 1
srun: error: r8i0n1: task 33: Exited with exit code 1
srun: error: r7i7n2: task 26: Exited with exit code 1
srun: error: r8i3n3: task 50: Exited with exit code 1
srun: error: r7i7n6: task 29: Exited with exit code 1
srun: error: r7i6n1: task 19: Exited with exit code 1
srun: error: r7i6n8: task 23: Exited with exit code 1
srun: error: r9i2n0: task 81: Exited with exit code 1
srun: error: r9i4n6: task 100: Exited with exit code 1
srun: error: r8i6n2: task 54: Exited with exit code 1
srun: error: r9i3n2: task 90: Exited with exit code 1
srun: error: r8i6n3: task 55: Exited with exit code 1
srun: error: r7i7n0: task 24: Exited with exit code 1
srun: error: r8i4n0: task 52: Exited with exit code 1
srun: error: r9i1n8: task 80: Exited with exit code 1
srun: error: r8i4n1: task 53: Exited with exit code 1
srun: error: r8i1n1: task 42: Exited with exit code 1
srun: error: r9i5n2: task 104: Exited with exit code 1
srun: error: r9i0n8: task 71: Exited with exit code 1
srun: error: r9i5n1: task 103: Exited with exit code 1
srun: error: r7i7n1: task 25: Exited with exit code 1
srun: error: r9i4n4: task 98: Exited with exit code 1
srun: error: r7i7n4: task 28: Exited with exit code 1
srun: error: r9i0n6: task 70: Exited with exit code 1
srun: error: r9i1n7: task 79: Exited with exit code 1
srun: error: r9i2n7: task 86: Exited with exit code 1
srun: error: r9i1n6: task 78: Exited with exit code 1
srun: error: r9i5n0: task 102: Exited with exit code 1
srun: error: r9i3n6: task 91: Exited with exit code 1
srun: error: r9i1n5: task 77: Exited with exit code 1
srun: error: r7i2n8: task 16: Exited with exit code 1
srun: error: r9i4n8: task 101: Exited with exit code 1
srun: error: r9i4n5: task 99: Exited with exit code 1
srun: error: r7i2n1: task 14: Exited with exit code 1
srun: error: r7i0n0: task 5: Exited with exit code 1
srun: error: r9i6n6: task 117: Exited with exit code 1
srun: error: r9i7n6: task 125: Exited with exit code 1
srun: error: r9i7n4: task 123: Exited with exit code 1
srun: error: r6i7n8: task 4: Exited with exit code 1
srun: error: r9i6n2: task 113: Exited with exit code 1
srun: error: r9i6n3: task 114: Exited with exit code 1
srun: error: r6i7n7: task 3: Exited with exit code 1
srun: error: r7i0n5: task 10: Exited with exit code 1
srun: error: r9i7n5: task 124: Exited with exit code 1
srun: error: r7i1n8: task 12: Exited with exit code 1
srun: error: r9i7n7: task 126: Exited with exit code 1
srun: error: r7i0n2: task 7: Exited with exit code 1
srun: error: r7i0n3: task 8: Exited with exit code 1
srun: error: r9i7n8: task 127: Exited with exit code 1
srun: error: r7i2n0: task 13: Exited with exit code 1
srun: error: r7i1n7: task 11: Exited with exit code 1
srun: error: r9i6n1: task 112: Exited with exit code 1
srun: error: r7i0n4: task 9: Exited with exit code 1
srun: error: r9i1n4: task 76: Exited with exit code 1
srun: error: r9i7n2: task 121: Exited with exit code 1
srun: error: r9i6n4: task 115: Exited with exit code 1
srun: error: r7i0n1: task 6: Exited with exit code 1
srun: error: r9i7n3: task 122: Exited with exit code 1
srun: error: r8i6n5: task 56: Exited with exit code 1
srun: error: r9i6n5: task 116: Exited with exit code 1
srun: error: r7i7n3: task 27: Exited with exit code 1
srun: error: r9i7n1: task 120: Exited with exit code 1
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
Killing subprocess 32020
Killing subprocess 32021
Killing subprocess 32022
Killing subprocess 32023
Main process received SIGTERM, exiting
Killing subprocess 2391
Killing subprocess 2392
Killing subprocess 2393
Killing subprocess 2395
Main process received SIGTERM, exiting
Killing subprocess 8155
Killing subprocess 8156
Killing subprocess 8157
Killing subprocess 8158
Main process received SIGTERM, exiting
Killing subprocess 1105
Killing subprocess 1106
Killing subprocess 1107
Killing subprocess 1108
Main process received SIGTERM, exiting
Killing subprocess 61308
Killing subprocess 70292
Killing subprocess 42836
Killing subprocess 70293
Killing subprocess 70294
Killing subprocess 30001
Killing subprocess 61309
Killing subprocess 61310
Killing subprocess 61312
Killing subprocess 42837
Killing subprocess 57225
Killing subprocess 70296
Killing subprocess 30002
Killing subprocess 30003
Killing subprocess 13020
Main process received SIGTERM, exiting
Killing subprocess 40485
Killing subprocess 72254
Killing subprocess 42838
Killing subprocess 42840
Main process received SIGTERM, exiting
Killing subprocess 57226
Killing subprocess 57227
Main process received SIGTERM, exiting
Killing subprocess 76054
Killing subprocess 30004
Main process received SIGTERM, exiting
Killing subprocess 13021
Killing subprocess 40486
Killing subprocess 14664
Killing subprocess 72255
Killing subprocess 57228
Main process received SIGTERM, exiting
Killing subprocess 16769
Killing subprocess 76055
Killing subprocess 76056
Killing subprocess 13022
Killing subprocess 13023
Main process received SIGTERM, exiting
Killing subprocess 40487
Killing subprocess 40488
Main process received SIGTERM, exiting
Killing subprocess 14665
Killing subprocess 14666
Killing subprocess 14668
Killing subprocess 72256
Killing subprocess 72258
Main process received SIGTERM, exiting
Killing subprocess 16770
Killing subprocess 16771
Killing subprocess 76057
Main process received SIGTERM, exiting
Killing subprocess 60803
Main process received SIGTERM, exiting
Killing subprocess 66379
Killing subprocess 16772
Main process received SIGTERM, exiting
Killing subprocess 60804
Killing subprocess 66380
Killing subprocess 66381
Killing subprocess 13204
Killing subprocess 60805
Killing subprocess 60806
Main process received SIGTERM, exiting
Killing subprocess 66382
Main process received SIGTERM, exiting
Killing subprocess 13205
Killing subprocess 13206
Killing subprocess 13207
Killing subprocess 33516
Killing subprocess 33006
Main process received SIGTERM, exiting
Killing subprocess 33517
Killing subprocess 33518
Killing subprocess 33520
Killing subprocess 33007
Killing subprocess 33008
Killing subprocess 33009
Killing subprocess 72301
Killing subprocess 16814
Main process received SIGTERM, exiting
Killing subprocess 59087
Killing subprocess 74735
Killing subprocess 13261
Main process received SIGTERM, exiting
Killing subprocess 55620
Killing subprocess 72302
Killing subprocess 16815
Killing subprocess 59088
Killing subprocess 74736
Killing subprocess 74737
Killing subprocess 74738
Main process received SIGTERM, exiting
Killing subprocess 13262
Killing subprocess 55621
Killing subprocess 55622
Killing subprocess 72303
Killing subprocess 72304
Main process received SIGTERM, exiting
slurmstepd: error: *** STEP 1271130.0 ON r7i6n1 CANCELLED AT 2021-09-27T17:43:09 ***
Killing subprocess 5069
Killing subprocess 16816
Killing subprocess 16817
Main process received SIGTERM, exiting
Killing subprocess 59089
Killing subprocess 59090
Main process received SIGTERM, exiting
Killing subprocess 36826
Killing subprocess 13263
Killing subprocess 13264
Main process received SIGTERM, exiting
Killing subprocess 55623
Main process received SIGTERM, exiting
Killing subprocess 72745
Killing subprocess 5070
Killing subprocess 5071
Killing subprocess 36827
Killing subprocess 36828
Killing subprocess 22929
Killing subprocess 5072
Main process received SIGTERM, exiting
Killing subprocess 23020
Killing subprocess 39440
Killing subprocess 36829
Main process received SIGTERM, exiting
Killing subprocess 72746
Killing subprocess 23021
Killing subprocess 39441
Killing subprocess 22930
Killing subprocess 22931
Killing subprocess 60544
Killing subprocess 72747
Killing subprocess 72748
Main process received SIGTERM, exiting
Killing subprocess 23022
Killing subprocess 23023
Main process received SIGTERM, exiting
Killing subprocess 39442
Killing subprocess 39443
Main process received SIGTERM, exiting
Killing subprocess 4007
Killing subprocess 22932
Main process received SIGTERM, exiting
Killing subprocess 60545
Killing subprocess 38454
Killing subprocess 31565
Killing subprocess 62249
Killing subprocess 4008
Killing subprocess 4009
Killing subprocess 60546
Killing subprocess 60547
Main process received SIGTERM, exiting
Killing subprocess 38455
Killing subprocess 38456
Killing subprocess 65136
Killing subprocess 31566
Killing subprocess 31567
Killing subprocess 31568
Main process received SIGTERM, exiting
Killing subprocess 14739
Killing subprocess 62250
Killing subprocess 62251
Killing subprocess 31604
Killing subprocess 4010
Main process received SIGTERM, exiting
Killing subprocess 38457
Main process received SIGTERM, exiting
Killing subprocess 65137
Killing subprocess 14740
Killing subprocess 14741
Killing subprocess 62252
Main process received SIGTERM, exiting
Killing subprocess 31605
Killing subprocess 65138
Killing subprocess 65139
Main process received SIGTERM, exiting
Killing subprocess 14743
Main process received SIGTERM, exiting
Killing subprocess 31606
Killing subprocess 31607
Main process received SIGTERM, exiting
Killing subprocess 3548
Killing subprocess 54160
Killing subprocess 3549
Killing subprocess 3550
Killing subprocess 54161
Killing subprocess 54162
Killing subprocess 54164
Main process received SIGTERM, exiting
Killing subprocess 33462
Killing subprocess 37254
Killing subprocess 62641
Killing subprocess 3552
Main process received SIGTERM, exiting
Killing subprocess 33463
Killing subprocess 33464
Killing subprocess 78252
Killing subprocess 37255
Killing subprocess 37256
Killing subprocess 62642
Killing subprocess 62643
Killing subprocess 62644
Killing subprocess 33465
Main process received SIGTERM, exiting
Killing subprocess 78253
Killing subprocess 78254
Killing subprocess 78255
Killing subprocess 37257
Main process received SIGTERM, exiting
Main process received SIGTERM, exiting
Killing subprocess 71588
Killing subprocess 52835
Killing subprocess 66284
Main process received SIGTERM, exiting
Killing subprocess 71589
Killing subprocess 52836
Killing subprocess 66285
Killing subprocess 71590
Killing subprocess 71591
Main process received SIGTERM, exiting
Killing subprocess 73370
Killing subprocess 52837
Killing subprocess 52838
Main process received SIGTERM, exiting
Killing subprocess 66286
Killing subprocess 66287
Main process received SIGTERM, exiting
Killing subprocess 70128
Killing subprocess 73371
Killing subprocess 73372
Killing subprocess 76744
Killing subprocess 73373
Main process received SIGTERM, exiting
Killing subprocess 70129
Killing subprocess 70130
Killing subprocess 70132
Main process received SIGTERM, exiting
Killing subprocess 76745
Killing subprocess 76746
Killing subprocess 76748
Main process received SIGTERM, exiting
Killing subprocess 42114
Killing subprocess 42115
Killing subprocess 42116
Killing subprocess 42117
Main process received SIGTERM, exiting
Killing subprocess 22439
Killing subprocess 22440
Killing subprocess 22441
Killing subprocess 22442
Killing subprocess 6741
Main process received SIGTERM, exiting
Killing subprocess 6742
Killing subprocess 6743
Killing subprocess 6744
Killing subprocess 27342
Main process received SIGTERM, exiting
Killing subprocess 4903
Killing subprocess 27343
Killing subprocess 7749
Killing subprocess 4904
Killing subprocess 4905
Killing subprocess 4906
Main process received SIGTERM, exiting
Killing subprocess 27344
Killing subprocess 27345
Main process received SIGTERM, exiting
Killing subprocess 7750
Killing subprocess 7751
Killing subprocess 7753
Main process received SIGTERM, exiting
Killing subprocess 78894
Killing subprocess 78895
Killing subprocess 78896
Killing subprocess 78897
Main process received SIGTERM, exiting
Killing subprocess 24072
Killing subprocess 7177
Killing subprocess 24073
Killing subprocess 7178
Killing subprocess 24074
Killing subprocess 24075
Main process received SIGTERM, exiting
Killing subprocess 7179
Killing subprocess 7180
Main process received SIGTERM, exiting
Killing subprocess 78710
Killing subprocess 78711
Killing subprocess 78712
Killing subprocess 78713
Main process received SIGTERM, exiting
Killing subprocess 66743
Killing subprocess 66744
Killing subprocess 66745
Killing subprocess 66751
Main process received SIGTERM, exiting
Killing subprocess 65099
Killing subprocess 65100
Killing subprocess 65101
Killing subprocess 65103
Main process received SIGTERM, exiting
srun: Job step aborted: Waiting up to 62 seconds for job step to finish.
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------
--------------------------------------------------

DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja

--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja ninja..................  [92m[OKAY][0m..................
 [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
op nameop name  ................................  installedinstalled  ....  compatiblecompatible

----------------------------------------------------------------------------------------------------

cpu_adam ...............cpu_adam  [92m[YES][0m .....................  [92m[YES][0m[92m[OKAY][0m 
...... [92m[OKAY][0m
fused_adam fused_adam.............  .............[93m[NO][0m  .......[93m[NO][0m  [92m[OKAY][0m
....... [92m[OKAY][0m
fused_lamb fused_lamb.............  [93m[NO][0m.............  .......[93m[NO][0m [92m[OKAY][0m 
....... [92m[OKAY][0m
sparse_attnsparse_attn  ........................  [93m[NO][0m[93m[NO][0m  .............. [92m[OKAY][0m 
[92m[OKAY][0m
transformer ............transformer  [93m[NO][0m............  .......[93m[NO][0m  [92m[OKAY][0m
....... [92m[OKAY][0m
stochastic_transformer stochastic_transformer.  [93m[NO][0m ........  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
ninjaninja ..................  ..................[92m[OKAY][0m 
[92m[OKAY][0m--------------------------------------------------

--------------------------------------------------op name
 ................op name  installed................  ..installed  compatible..
 compatible--------------------------------------------------

--------------------------------------------------
cpu_adamcpu_adam  ..............................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

fused_adamfused_adam  ..........................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

fused_lambfused_lamb  ..........................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

sparse_attnsparse_attn  ........................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

transformertransformer  ........................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m
[92m[OKAY][0m
stochastic_transformerstochastic_transformer  ..  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****


----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninjaJIT compiled ops requires ninja

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
----------------------------------------------------------------------------------------------------

--------------------------------------------------DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report


--------------------------------------------------DeepSpeed C++/CUDA extension op report--------------------------------------------------
DeepSpeed C++/CUDA extension op report

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------


--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja
--------------------------------------------------
JIT compiled ops requires ninja--------------------------------------------------


JIT compiled ops requires ninja
JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
ninjaninja  ....................................  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

op nameop name  ................................  installedinstalled  ....  compatiblecompatible

----------------------------------------------------------------------------------------------------

cpu_adam cpu_adam...............  ...............[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
fused_adam .............fused_adam  [93m[NO][0m.............  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
fused_lamb fused_lamb.............  .............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
sparse_attnsparse_attn  ........................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

transformer transformer............  ............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
stochastic_transformerstochastic_transformer  ..  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

ninjaninjaninjaninja    ........................................................................  [92m[OKAY][0m  [92m[OKAY][0m
[92m[OKAY][0m[92m[OKAY][0m


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


op nameop nameop nameop name    ................................................................    installedinstalledinstalled installed  .. .... ..  compatible compatiblecompatible

compatible
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------

cpu_adamcpu_adam  cpu_adamcpu_adam..............................    ..............................[92m[YES][0m[92m[YES][0m    [92m[YES][0m[92m[YES][0m............    [92m[OKAY][0m............[92m[OKAY][0m
  
[92m[OKAY][0m[92m[OKAY][0m

fused_adam fused_adam.............fused_adamfused_adam    .............[93m[NO][0m..........................    [93m[NO][0m.......[93m[NO][0m[93m[NO][0m    .......[92m[OKAY][0m.............. 
  [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


fused_lamb ............. fused_lambfused_lamb[93m[NO][0m fused_lamb  ............. .................................    [93m[NO][0m[92m[OKAY][0m[93m[NO][0m[93m[NO][0m 
  .....................   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


sparse_attn ............ [93m[NO][0m .......sparse_attnsparse_attn   [92m[OKAY][0msparse_attn........................
   [93m[NO][0m[93m[NO][0m............ transformer  .............. [93m[NO][0m ............  [92m[OKAY][0m [92m[OKAY][0m.......
[93m[NO][0m
  [92m[OKAY][0m.......transformer
 transformer [92m[OKAY][0m ............
............  [93m[NO][0mtransformer[93m[NO][0m   .......stochastic_transformer...................    [92m[OKAY][0m[93m[NO][0m[92m[OKAY][0m
. 
 .......[93m[NO][0m  [92m[OKAY][0mstochastic_transformer
.......stochastic_transformer   [92m[OKAY][0m.
. stochastic_transformer [93m[NO][0m [93m[NO][0m  ...............   [92m[OKAY][0m[93m[NO][0m[92m[OKAY][0m
 
....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ sparse_attninstalled  ..............  [93m[NO][0mcompatible 
.......-------------------------------------------------- 
[92m[OKAY][0m
transformer ............ [93m[NO][0m cpu_adam.......  ...............[92m[OKAY][0m 
[92m[YES][0m ...... stochastic_transformer[92m[OKAY][0m 
. [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
------------------------------------------------------------------------------------------------------------------------------------------------------


DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
DeepSpeed C++/CUDA extension op report

--------------------------------------------------
----------------------------------------------------------------------------------------------------

--------------------------------------------------
JIT compiled ops requires ninja
JIT compiled ops requires ninjaJIT compiled ops requires ninja
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja    ........................................................................   [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m

--------------------------------------------------

------------------------------------------------------------------------------------------------------------------------------------------------------

op name
 op nameop nameop name................   ................ ................ ................installedinstalled    installedinstalled....   .. ..compatiblecompatible  

compatiblecompatible----------------------------------------------------------------------------------------------------


----------------------------------------------------------------------------------------------------

cpu_adam ...............cpu_adam  [92m[YES][0m...............cpu_adamcpu_adam    ....................................[92m[YES][0m  [92m[YES][0m [92m[OKAY][0m
 ...... ...... [92m[YES][0m [92m[OKAY][0m fused_adam[92m[OKAY][0m
 
...................  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
fused_lamb fused_adam............. fused_adam .............fused_adam [93m[NO][0m  ............. [93m[NO][0m .................... [93m[NO][0m  [92m[OKAY][0m....... 
 [93m[NO][0m.......[92m[OKAY][0m  [92m[OKAY][0m
.......
 [92m[OKAY][0mfused_lamb
fused_lamb .............  .............fused_lambsparse_attn[93m[NO][0m    [93m[NO][0m................................    [93m[NO][0m[93m[NO][0m.......[92m[OKAY][0m   
[92m[OKAY][0m..............
  [92m[OKAY][0m[92m[OKAY][0m

transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn sparse_attn............stochastic_transformer sparse_attn [93m[NO][0m   ......................... .......[93m[NO][0m   [93m[NO][0m [92m[OKAY][0m[93m[NO][0m .......
 ....... ....... [92m[OKAY][0mtransformer[92m[OKAY][0m 
 
[92m[OKAY][0m............
 transformertransformer[93m[NO][0m   ...............................   [93m[NO][0m[93m[NO][0m[92m[OKAY][0m 
 ..............  [92m[OKAY][0m[92m[OKAY][0mstochastic_transformer

 . stochastic_transformer[93m[NO][0mstochastic_transformer  .......  .[92m[OKAY][0m .
[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
------------------------------------------------------------------------------------------------------------------------------------------------------


DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

DeepSpeed C++/CUDA extension op report--------------------------------------------------
----------------------------------------------------------------------------------------------------
--------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninjaJIT compiled ops requires ninja


--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja   .................................... ..................   ..................[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m
 

[92m[OKAY][0m------------------------------------------------------------------------------------------------------------------------------------------------------


op name-------------------------------------------------- op name................op name
   ................op name installed................ installed ..................  installed  compatibleinstalled
..   --------------------------------------------------....compatible
 
 compatible--------------------------------------------------compatible


----------------------------------------------------------------------------------------------------

cpu_adam ............... [92m[YES][0m ......cpu_adam  cpu_adam[92m[OKAY][0m...............cpu_adam 
  ...............[92m[YES][0m...............  [92m[YES][0m ...... ...... [92m[YES][0m [92m[OKAY][0mfused_adam
 [92m[OKAY][0m ......
.............  [92m[OKAY][0m[93m[NO][0m 
.......fused_adam  [92m[OKAY][0m.............
 [93m[NO][0mfused_adam fused_lamb.......   fused_adam.............  .............[92m[OKAY][0m............. [93m[NO][0m
 [93m[NO][0m  [93m[NO][0m..............fused_lamb    .......[92m[OKAY][0m[92m[OKAY][0m .............

 [92m[OKAY][0m[93m[NO][0m
 fused_lamb.......  .............fused_lamb[92m[OKAY][0m  .............
[93m[NO][0m sparse_attn [93m[NO][0m ....... ............ .......[92m[OKAY][0m  [93m[NO][0m
[92m[OKAY][0m 
.......sparse_attn  [92m[OKAY][0m............
 [93m[NO][0m .......transformer  [92m[OKAY][0m............
 [93m[NO][0msparse_attnsparse_attn  transformer.......  ............ ........................ [92m[OKAY][0m  [93m[NO][0m
[93m[NO][0m[93m[NO][0m   .......stochastic_transformer..............   [92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m
.
 [93m[NO][0mtransformerstochastic_transformer transformer.......    ............[92m[OKAY][0m .............[93m[NO][0m
   [93m[NO][0m[93m[NO][0m.......   ..............[92m[OKAY][0m  
[92m[OKAY][0m[92m[OKAY][0m

stochastic_transformer stochastic_transformer.  [93m[NO][0m ........  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. .. [93m[NO][0m
 ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizerasync_io  .............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[93m[NO][0m

--------------------------------------------------
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report

--------------------------------------------------
DeepSpeed C++/CUDA extension op report--------------------------------------------------
DeepSpeed C++/CUDA extension op report


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------DeepSpeed C++/CUDA extension op report--------------------------------------------------


--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
JIT compiled ops requires ninja

--------------------------------------------------
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
JIT compiled ops requires ninja
JIT compiled ops requires ninja
--------------------------------------------------

JIT compiled ops requires ninja
------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------
DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report

--------------------------------------------------
DeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------


--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja----------------------------------------------------------------------------------------------------


JIT compiled ops requires ninja
--------------------------------------------------
JIT compiled ops requires ninja

JIT compiled ops requires ninja
ninjaninjaninjaninja  .................. ..................  .................. ..................[92m[OKAY][0m[92m[OKAY][0m 

 [92m[OKAY][0m[92m[OKAY][0m--------------------------------------------------

--------------------------------------------------
----------------------------------------------------------------------------------------------------
op name

 op name................op nameop name    ................................installed................    installed..installedinstalled   compatible ....
.. --------------------------------------------------compatible  
compatible
compatible

----------------------------------------------------------------------------------------------------
--------------------------------------------------

cpu_adam cpu_adam...............cpu_adam   [92m[YES][0mcpu_adam..............................    [92m[YES][0m......[92m[YES][0m  ...............  ............[92m[YES][0m [92m[OKAY][0m  
[92m[OKAY][0m[92m[OKAY][0m......

 [92m[OKAY][0m
fused_adamfused_adam fused_adam .............fused_adam .............  .............  [93m[NO][0m............. [93m[NO][0m[93m[NO][0m .......  [93m[NO][0m ..............  [92m[OKAY][0m....... [92m[OKAY][0m
[92m[OKAY][0m 

[92m[OKAY][0mfused_lamb
fused_lambfused_lamb  ............. fused_lamb  ..........................[93m[NO][0m .............  [93m[NO][0m [93m[NO][0m....... [93m[NO][0m  ....... ....... [92m[OKAY][0m.......[92m[OKAY][0m 
 [92m[OKAY][0m
[92m[OKAY][0m

sparse_attnsparse_attn sparse_attn ............sparse_attn  ............[93m[NO][0m ............  ...................  [93m[NO][0m [92m[OKAY][0m[93m[NO][0m [93m[NO][0m
 ....... ....... ....... [92m[OKAY][0mtransformer[92m[OKAY][0m 
 
[92m[OKAY][0m............ 
transformer[93m[NO][0mtransformer   ...............................transformer    [92m[OKAY][0m[93m[NO][0m............
 [93m[NO][0m .......[93m[NO][0m   .......stochastic_transformer....... [92m[OKAY][0m [92m[OKAY][0m 

[92m[OKAY][0m.
 [93m[NO][0mstochastic_transformerstochastic_transformer   stochastic_transformer....... . . [92m[OKAY][0m [93m[NO][0m.
 [93m[NO][0m .......  [93m[NO][0m.......[92m[OKAY][0m  
.......[92m[OKAY][0m 
[92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------


op nameop nameop nameop name  ................ ................................    ................installedinstalledinstalled    ..installed..  .. ..compatiblecompatible  

compatiblecompatible--------------------------------------------------
--------------------------------------------------


----------------------------------------------------------------------------------------------------

cpu_adam ...............cpu_adamcpu_adam cpu_adam  [92m[YES][0m..............................    ......[92m[YES][0m...............[92m[YES][0m   ............[92m[YES][0m   [92m[OKAY][0m......[92m[OKAY][0m 

[92m[OKAY][0m 
[92m[OKAY][0m
fused_adam fused_adam............. fused_adam ............. fused_adam .............[93m[NO][0m[93m[NO][0m  .......[93m[NO][0m ....... [92m[OKAY][0m
   ....................[92m[OKAY][0mfused_lamb 
 [92m[OKAY][0m .............
[93m[NO][0mfused_lamb   fused_lamb....................[93m[NO][0m   [93m[NO][0m ....................   .......[92m[OKAY][0m [93m[NO][0m[92m[OKAY][0m
[92m[OKAY][0m
 
....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0msparse_attn  .......sparse_attnsparse_attn............   ............ ............ [93m[NO][0m [93m[NO][0m [93m[NO][0m....... [92m[OKAY][0m  .......
[92m[OKAY][0m....... 
 [92m[OKAY][0m[92m[OKAY][0mtransformer

 ............ transformertransformer[93m[NO][0m  ............ ...................   [93m[NO][0m[93m[NO][0m[92m[OKAY][0m 
 .......sparse_attn....... stochastic_transformer[92m[OKAY][0m   
............[92m[OKAY][0m.
 [93m[NO][0mstochastic_transformer   .......stochastic_transformer[93m[NO][0m.   [92m[OKAY][0m [93m[NO][0m
.  ..............[93m[NO][0m   [92m[OKAY][0m[92m[OKAY][0m.......
 [92m[OKAY][0m

transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
------------------------------------------------------------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report

DeepSpeed C++/CUDA extension op report----------------------------------------------------------------------------------------------------


--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.----------------------------------------------------------------------------------------------------


--------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja


JIT compiled ops requires ninja
ninjaninjaninjaninja   .................................... ..................  .................. [92m[OKAY][0m[92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m


----------------------------------------------------------------------------------------------------
--------------------------------------------------
--------------------------------------------------

op nameop nameop name op name ................   ................................................ installed  installed installedinstalled  .. .. .. ..compatible compatible 
compatible
compatible
--------------------------------------------------
--------------------------------------------------
----------------------------------------------------------------------------------------------------


cpu_adam ...............cpu_adam cpu_adam [92m[YES][0m ...............cpu_adam...............    .....................[92m[YES][0m[92m[YES][0m   [92m[YES][0m [92m[OKAY][0m ......
............   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


fused_adam ............. [93m[NO][0m .......fused_adamfused_adam  fused_adam[92m[OKAY][0m.............  
 ..........................[93m[NO][0m   [93m[NO][0m....... fused_lamb[93m[NO][0m  ....... .............[92m[OKAY][0m ....... [92m[OKAY][0m
 [93m[NO][0m
[92m[OKAY][0m 
fused_lamb....... fused_lamb .............fused_lamb [92m[OKAY][0m  ..........................
 [93m[NO][0m [93m[NO][0m [93m[NO][0m ....... ..............   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformersparse_attn sparse_attn sparse_attn ........................  ............ ............[93m[NO][0m   [93m[NO][0m[93m[NO][0m [93m[NO][0m..............    .......[92m[OKAY][0m [92m[OKAY][0m.......
[92m[OKAY][0m
 
[92m[OKAY][0mtransformer
 stochastic_transformertransformer............ transformer  ............ [93m[NO][0m ............. [93m[NO][0m   .......[93m[NO][0m  [93m[NO][0m[92m[OKAY][0m....... .......
 ....... [92m[OKAY][0m 
[92m[OKAY][0m[92m[OKAY][0mstochastic_transformer

 stochastic_transformer.stochastic_transformer   [93m[NO][0m..   .......[93m[NO][0m[93m[NO][0m   .......[92m[OKAY][0m.......  
[92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report
--------------------------------------------------
--------------------------------------------------

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed C++/CUDA extension op report--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------


----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja--------------------------------------------------


--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------
--------------------------------------------------

--------------------------------------------------JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report


DeepSpeed C++/CUDA extension op report--------------------------------------------------

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
JIT compiled ops requires ninja
--------------------------------------------------
JIT compiled ops requires ninja

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninjaJIT compiled ops requires ninja

------------------------------------------------------------------------------------------------------------------------------------------------------


DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report


--------------------------------------------------
----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------


DeepSpeed C++/CUDA extension op report--------------------------------------------------JIT compiled ops requires ninja--------------------------------------------------


--------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report--------------------------------------------------


JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

--------------------------------------------------
----------------------------------------------------------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


------------------------------------------------------------------------------------------------------------------------------------------------------


JIT compiled ops requires ninjaJIT compiled ops requires ninjaJIT compiled ops requires ninja


----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

------------------------------------------------------------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

DeepSpeed C++/CUDA extension op report--------------------------------------------------
--------------------------------------------------

--------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja--------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
DeepSpeed C++/CUDA extension op report--------------------------------------------------

--------------------------------------------------JIT compiled ops requires ninja

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
------------------------------------------------------------------------------------------------------------------------------------------------------

JIT compiled ops requires ninja
DeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report

--------------------------------------------------
----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
DeepSpeed C++/CUDA extension op report
JIT compiled ops requires ninja

----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------
JIT compiled ops requires ninja
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninjaninjaninjaninja   .................. .................. ....................................  [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m
--------------------------------------------------


------------------------------------------------------------------------------------------------------------------------------------------------------op name


 op nameop name................ op name  ................ installed................ ................  installed ..installed installed..    compatible....compatible 
 
compatible--------------------------------------------------compatible


----------------------------------------------------------------------------------------------------

--------------------------------------------------
cpu_adam ...............cpu_adam cpu_adam cpu_adam[92m[YES][0m ...............  ............... .....................  [92m[YES][0m [92m[YES][0m [92m[OKAY][0m
[92m[YES][0m ...... ...... ...... [92m[OKAY][0m [92m[OKAY][0m

[92m[OKAY][0m
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


fused_adam ............. [93m[NO][0m ....... fused_adam[92m[OKAY][0m
------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------

op name
ninjaninjaninjaninja    .................................... ....................................  [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
fused_adam fused_adam .............  .............fused_lamb[93m[NO][0m.............    .......[93m[NO][0m.............[93m[NO][0m   [92m[OKAY][0m .......[93m[NO][0m
.......   .......[92m[OKAY][0m[92m[OKAY][0mfused_lamb

op name  ................op nameop name................ installed    ................installed..................    ..compatibleinstalledinstalled  compatible
 

  [92m[OKAY][0m.............fused_lamb
..--------------------------------------------------..-------------------------------------------------- 

 compatiblecompatible

----------------------------------------------------------------------------------------------------

op nameop nameop name  op name................................    installedinstalled................ ................  .... installed  installed compatiblecompatible.. 
fused_lamb   [93m[NO][0m............. ............. ....... [93m[NO][0m [93m[NO][0m[92m[OKAY][0m  
..............  [92m[OKAY][0m[92m[OKAY][0msparse_attn
cpu_adam ...............cpu_adam  [92m[YES][0m............... cpu_adam ......cpu_adam  [92m[YES][0m ...............[92m[OKAY][0m ............... 
..--------------------------------------------------
  --------------------------------------------------
compatiblecompatible


----------------------------------------------------------------------------------------------------


 ............ [93m[NO][0m ....... [92m[OKAY][0m
......[92m[YES][0m   [92m[YES][0m......[92m[OKAY][0m  
......[92m[OKAY][0m 
cpu_adam cpu_adam...............  ...............cpu_adamcpu_adam[92m[YES][0m    ...............[92m[YES][0m.....................   ......  [92m[OKAY][0m[92m[YES][0m
sparse_attn ............transformer sparse_attn [93m[NO][0msparse_attn  ............  ...............................[93m[NO][0m   [92m[OKAY][0m [93m[NO][0m.......[93m[NO][0m
fused_adam[92m[OKAY][0m 
............. [93m[NO][0m ....... [92m[OKAY][0m
[92m[YES][0m[92m[OKAY][0m  
   .......transformer[92m[OKAY][0m 
....... [92m[OKAY][0m ............
fused_adam fused_adam.............fused_lamb  fused_adam [93m[NO][0m..........................    ....................[93m[NO][0m[93m[NO][0m    [93m[NO][0m[92m[OKAY][0m..............
   .......[92m[OKAY][0m[92m[OKAY][0m 
fused_lamb
[92m[OKAY][0m 
............  [92m[OKAY][0m[92m[OKAY][0m

 stochastic_transformer[92m[OKAY][0m[93m[NO][0m 
.............fused_lamb  [93m[NO][0mfused_lamb.............  ....... ............. sparse_attn[93m[NO][0m [92m[OKAY][0m  [93m[NO][0m
 ................... ....... [93m[NO][0m [92m[OKAY][0m [92m[OKAY][0m
.......
fused_adam ............. [93m[NO][0mfused_adam  .................... fused_adam fused_adam[92m[OKAY][0m [93m[NO][0m 
transformer  .transformer....... ............  [93m[NO][0m............  [92m[OKAY][0m .......[93m[NO][0m
[93m[NO][0m   [92m[OKAY][0m.......
 [92m[OKAY][0m
............. ............. ....... [93m[NO][0mfused_lamb  [93m[NO][0m [92m[OKAY][0m .............
....... ....... [93m[NO][0m [92m[OKAY][0m [92m[OKAY][0m
fused_lamb.......
.......stochastic_transformer   [92m[OKAY][0m[92m[OKAY][0m

. [93m[NO][0m .......stochastic_transformer stochastic_transformer [92m[OKAY][0m 
sparse_attn transformer............  ............[93m[NO][0msparse_attnsparse_attn    [93m[NO][0m................... ............ [92m[OKAY][0m  
[93m[NO][0m.......[93m[NO][0m   .......[92m[OKAY][0mtransformer .......
  .............[92m[OKAY][0m
fused_lamb fused_lamb [93m[NO][0m ............. ....................   [93m[NO][0m[93m[NO][0m[92m[OKAY][0m  
..  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

 [92m[OKAY][0m ............
 [92m[OKAY][0mstochastic_transformer[93m[NO][0m
..............  sparse_attn[92m[OKAY][0m[92m[OKAY][0m 

  transformer........   transformer[92m[OKAY][0m[93m[NO][0m............
............ [93m[NO][0m ....... [92m[OKAY][0m
   ...................[93m[NO][0m   stochastic_transformer.......[92m[OKAY][0m[93m[NO][0m  
sparse_attn ............transformer  [93m[NO][0msparse_attnsparse_attn............  .......   ........................[92m[OKAY][0m[93m[NO][0m   .......
[93m[NO][0m[93m[NO][0m   transformer[92m[OKAY][0m....... .......
 ........ [92m[OKAY][0m [93m[NO][0m [92m[OKAY][0m
.......
 [92m[OKAY][0m
ninjaninjaninjaninja  ..................  .................. .................. ..................[92m[OKAY][0m [92m[OKAY][0m [92m[OKAY][0m

[92m[OKAY][0m
--------------------------------------------------
----------------------------------------------------------------------------------------------------
  ............[92m[OKAY][0m[92m[OKAY][0m 

stochastic_transformerstochastic_transformer  ..  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m


--------------------------------------------------op name
op nameop name  op name ................................................    ................installedinstalledinstalled    ..installed ....compatible   
..compatiblecompatible-------------------------------------------------- 

stochastic_transformer[93m[NO][0m  .......transformer. transformer[92m[OKAY][0m  
compatible

----------------------------------------------------------------------------------------------------
--------------------------------------------------

cpu_adam ............... [92m[YES][0m cpu_adam......  cpu_adam...............[92m[OKAY][0mcpu_adam  
 ...............[92m[YES][0m...............   [92m[YES][0m......[92m[YES][0m   ......[92m[OKAY][0m...... 
 [92m[OKAY][0m[92m[OKAY][0mfused_adam
 ............[93m[NO][0m............   .......[93m[NO][0m stochastic_transformer[93m[NO][0m  [92m[OKAY][0m.......
 . ....... [92m[OKAY][0m[93m[NO][0m  
[92m[OKAY][0m.......
 [92m[OKAY][0m

 ............. [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer stochastic_transformer . .[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
fused_adam fused_lamb.............fused_adam   fused_adam[93m[NO][0m..........................    .............[93m[NO][0m.......[93m[NO][0m   [93m[NO][0m  .....................[92m[OKAY][0m  
 [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


fused_lamb ............. [93m[NO][0mfused_lamb fused_lamb  .................................   [92m[OKAY][0m[93m[NO][0m[93m[NO][0msparse_attn
   ..........................   [92m[OKAY][0m[92m[OKAY][0m[93m[NO][0m

 ....... [92m[OKAY][0m
transformersparse_attn  ........................  [93m[NO][0m[93m[NO][0m  .......sparse_attn.......sparse_attn  [92m[OKAY][0m  ............[92m[OKAY][0m
............ 
 [93m[NO][0mstochastic_transformer [93m[NO][0mtransformer.......    ....................[92m[OKAY][0m   
[92m[OKAY][0m[93m[NO][0m[93m[NO][0m
  ..............transformer  transformer [92m[OKAY][0m[92m[OKAY][0m ............
............
  [93m[NO][0m[93m[NO][0m stochastic_transformer  ..............  .[92m[OKAY][0m[92m[OKAY][0m 

[93m[NO][0m ....... stochastic_transformer[92m[OKAY][0m 
stochastic_transformer.  [93m[NO][0m.  .......[93m[NO][0m [92m[OKAY][0m 
....... [92m[OKAY][0m
ninjaninjaninjaninja    ........................................................................   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m [92m[OKAY][0m


----------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------


op nameop nameop name op name ................  ................................  ................installedinstalled    ..installed..installed   compatible.. 
 compatible..compatible--------------------------------------------------
 

--------------------------------------------------compatible--------------------------------------------------


--------------------------------------------------
cpu_adam ............... [92m[YES][0mcpu_adamcpu_adam  cpu_adam...... ...............  ............... [92m[OKAY][0m............... 
[92m[YES][0m  [92m[YES][0m[92m[YES][0m......   ............ [92m[OKAY][0m [92m[OKAY][0mfused_adam
[92m[OKAY][0m
 
............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lambfused_adam fused_adam fused_adam............. .............  ..........................[93m[NO][0m   [93m[NO][0m  [93m[NO][0m..............[93m[NO][0m    [92m[OKAY][0m[92m[OKAY][0m..............

  [92m[OKAY][0m[92m[OKAY][0m

fused_lamb ............. [93m[NO][0mfused_lamb  .......fused_lamb [92m[OKAY][0m .............
sparse_attn.............   [93m[NO][0m............[93m[NO][0m   .......[93m[NO][0m.......   .......[92m[OKAY][0m[92m[OKAY][0m 

[92m[OKAY][0msparse_attn
 ............ transformer[93m[NO][0m  ...................  [93m[NO][0m[92m[OKAY][0m 
....... transformer[92m[OKAY][0m
sparse_attnsparse_attn  ............ ............ ............stochastic_transformer [93m[NO][0m   [93m[NO][0m[93m[NO][0m....... . ..............   [93m[NO][0m [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m
 
.......
 [92m[OKAY][0m
stochastic_transformertransformertransformer   .........................   [93m[NO][0m[93m[NO][0m[93m[NO][0m   .....................   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


stochastic_transformer stochastic_transformer . [93m[NO][0m.  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
ninjaninjaninjaninja    .................................... ....................................  [92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m[92m[OKAY][0m--------------------------------------------------


--------------------------------------------------
----------------------------------------------------------------------------------------------------op name
op name 
 op name................ ................op name installed................   installed .................. installed  ..compatible installed 
 compatible..--------------------------------------------------
.. 
-------------------------------------------------- compatible
compatible

----------------------------------------------------------------------------------------------------

cpu_adam ............... cpu_adam[92m[YES][0m  .....................  cpu_adamcpu_adam[92m[YES][0m[92m[OKAY][0m  
 ....................................   [92m[YES][0m[92m[OKAY][0m[92m[YES][0m
  ............  [92m[OKAY][0m[92m[OKAY][0m
fused_adam
 ............. [93m[NO][0m ....... fused_adam[92m[OKAY][0m 
............. [93m[NO][0m fused_adamfused_adam.......fused_lamb    ..........................[92m[OKAY][0m.............  
 [93m[NO][0m[93m[NO][0m [93m[NO][0mfused_lamb .......   ...........................  [92m[OKAY][0m [92m[OKAY][0m[93m[NO][0m

[92m[OKAY][0m 
.......fused_lamb  [92m[OKAY][0mfused_lamb
.............  .............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m sparse_attn
[92m[OKAY][0m 
............ [93m[NO][0msparse_attn  ...................  [92m[OKAY][0m[93m[NO][0m
 ....... sparse_attntransformer[92m[OKAY][0m  sparse_attn............
............   ............[93m[NO][0mtransformer [93m[NO][0m  [93m[NO][0m ................... ....... .......  [92m[OKAY][0m[93m[NO][0m[92m[OKAY][0m  [92m[OKAY][0m

.......
 [92m[OKAY][0mstochastic_transformer
transformer transformer  ......................... stochastic_transformer   [93m[NO][0m[93m[NO][0m .[93m[NO][0m .......  .......[93m[NO][0m .......[92m[OKAY][0m   [92m[OKAY][0m
[92m[OKAY][0m
.......
 stochastic_transformer[92m[OKAY][0m stochastic_transformer
 . .[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------
DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report


DeepSpeed C++/CUDA extension op report----------------------------------------------------------------------------------------------------
--------------------------------------------------

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


----------------------------------------------------------------------------------------------------JIT compiled ops requires ninja--------------------------------------------------


JIT compiled ops requires ninjaJIT compiled ops requires ninja
JIT compiled ops requires ninja

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
--------------------------------------------------JIT compiled ops requires ninja

DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja--------------------------------------------------

JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------JIT compiled ops requires ninja
DeepSpeed C++/CUDA extension op report


DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report
--------------------------------------------------

----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------


--------------------------------------------------JIT compiled ops requires ninja
--------------------------------------------------
JIT compiled ops requires ninja

JIT compiled ops requires ninja
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m


----------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------

op name
 op nameop name................ op name ................ ................  installedinstalled ................  installed .... installed .. compatible  compatible
compatible
..
 ------------------------------------------------------------------------------------------------------------------------------------------------------compatible


--------------------------------------------------
cpu_adamcpu_adamcpu_adam  cpu_adam............... ...............   [92m[YES][0m..............................  [92m[YES][0m  ......[92m[YES][0m[92m[YES][0m......   [92m[OKAY][0m ......
......[92m[OKAY][0m  
[92m[OKAY][0m[92m[OKAY][0m

fused_adam ............. [93m[NO][0m fused_adam.......  .............fused_adam fused_adam[92m[OKAY][0m.............  
 [93m[NO][0m.............[93m[NO][0m fused_lamb.......    [93m[NO][0m....................[92m[OKAY][0m   
.......[93m[NO][0m[92m[OKAY][0m  
.......fused_lamb[92m[OKAY][0m  [92m[OKAY][0mfused_lamb

.............  .............[93m[NO][0mfused_lamb   .......[93m[NO][0m.............   [92m[OKAY][0m.......[93m[NO][0m
sparse_attn   [92m[OKAY][0m...................
  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
sparse_attn ............ transformer[93m[NO][0m  sparse_attn...................sparse_attn    [92m[OKAY][0m[93m[NO][0m........................
   .......[93m[NO][0mtransformer [93m[NO][0m  .......[92m[OKAY][0m 
............ ....... [93m[NO][0m[92m[OKAY][0m stochastic_transformer 
 .......[92m[OKAY][0m 
----------------------------------------------------------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op report
--------------------------------------------------

DeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report


.transformer[92m[OKAY][0m transformer[93m[NO][0m
------------------------------------------------------------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------


   ...............................  [92m[OKAY][0m [93m[NO][0mstochastic_transformer
[93m[NO][0m   ...............   [92m[OKAY][0m[92m[OKAY][0m[93m[NO][0m
------------------------------------------------------------------------------------------------------------------------------------------------------


JIT compiled ops requires ninjaJIT compiled ops requires ninjaJIT compiled ops requires ninjaJIT compiled ops requires ninja


 ....... stochastic_transformer[92m[OKAY][0m
 stochastic_transformer . . [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


op nameop nameop nameop name   ................ ................................ ................  installed installedinstalled installed ..  .. ....  compatiblecompatiblecompatible 


compatible----------------------------------------------------------------------------------------------------

--------------------------------------------------
--------------------------------------------------

cpu_adamcpu_adam  cpu_adamcpu_adam..............................    [92m[YES][0m..............................[92m[YES][0m    ......[92m[YES][0m......[92m[YES][0m    [92m[OKAY][0m......[92m[OKAY][0m......

  [92m[OKAY][0m[92m[OKAY][0m

fused_adamfused_adam  ..........................  [93m[NO][0mfused_adam[93m[NO][0m  fused_adam ....................  .......[93m[NO][0m .............  [92m[OKAY][0m [93m[NO][0m[92m[OKAY][0m....... 

 .......[92m[OKAY][0m 
[92m[OKAY][0mfused_lambfused_lamb
  fused_lamb.......................... fused_lamb .............   [93m[NO][0m[93m[NO][0m[93m[NO][0m.............    ..............[93m[NO][0m.......   [92m[OKAY][0m
 [92m[OKAY][0m[92m[OKAY][0m.......

 [92m[OKAY][0m
sparse_attnsparse_attnsparse_attn   sparse_attn....................................    ............[93m[NO][0m[93m[NO][0m [93m[NO][0m   [93m[NO][0m.....................  .......  [92m[OKAY][0m
 [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0mtransformer


 ............ transformertransformertransformer[93m[NO][0m    ...........................................    [93m[NO][0m[93m[NO][0m[93m[NO][0m[92m[OKAY][0m   
.....................   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0mstochastic_transformer

 
.stochastic_transformerstochastic_transformer  stochastic_transformer [93m[NO][0m.   .........[93m[NO][0m    [93m[NO][0m[93m[NO][0m [92m[OKAY][0m....... ....... 
....... [92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m

ninjaninjaninjaninja   .................. .................................... .................. [92m[OKAY][0m  [92m[OKAY][0m
[92m[OKAY][0m[92m[OKAY][0m


------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------


op nameop nameop name op name  ................................ ................  ................ installedinstalled installed  installed.... .. compatible
  compatiblecompatible --------------------------------------------------

..----------------------------------------------------------------------------------------------------
 

compatible
--------------------------------------------------
cpu_adamcpu_adamcpu_adam   .............................................   [92m[YES][0m[92m[YES][0m  [92m[YES][0mcpu_adam...... ......  ............... [92m[OKAY][0m......[92m[OKAY][0m 
 [92m[YES][0m
[92m[OKAY][0m 
...... [92m[OKAY][0m
fused_adam fused_adam.............fused_adam   .............[93m[NO][0m.............   [93m[NO][0mfused_adam[93m[NO][0m  ....................  ....... [92m[OKAY][0m [93m[NO][0m.......
[92m[OKAY][0m  
.......[92m[OKAY][0mfused_lamb 
 [92m[OKAY][0mfused_lamb
............. fused_lamb ............. [93m[NO][0m .............  [93m[NO][0mfused_lamb[93m[NO][0m.......    ........................... [92m[OKAY][0m  [93m[NO][0m[92m[OKAY][0m
[92m[OKAY][0m 

....... [92m[OKAY][0m
sparse_attnsparse_attnsparse_attn   ........................ sparse_attn............[93m[NO][0m   [93m[NO][0m [93m[NO][0m............ ....... .......  [93m[NO][0m [92m[OKAY][0m....... [92m[OKAY][0m
....... 
 [92m[OKAY][0m[92m[OKAY][0mtransformer

transformer  ........................transformer   transformer[93m[NO][0m[93m[NO][0m............    .......................... [93m[NO][0m [93m[NO][0m  [92m[OKAY][0m .......[92m[OKAY][0m
....... 
 [92m[OKAY][0m[92m[OKAY][0m
stochastic_transformer
 stochastic_transformer ..stochastic_transformer stochastic_transformer  [93m[NO][0m [93m[NO][0m  .........  ....... [93m[NO][0m[92m[OKAY][0m [93m[NO][0m 
 [92m[OKAY][0m..............
  [92m[OKAY][0m[92m[OKAY][0m

ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------

op nameop nameop name  op name ................ ................................ ................installed    installed..installedinstalled    ..compatible....  
 compatiblecompatible--------------------------------------------------
compatible

----------------------------------------------------------------------------------------------------


--------------------------------------------------
cpu_adam ............... cpu_adam[92m[YES][0mcpu_adam  cpu_adam............... ......  [92m[YES][0m...............  ............... ...... [92m[OKAY][0m[92m[YES][0m
 [92m[YES][0m [92m[OKAY][0m ......
......  [92m[OKAY][0m[92m[OKAY][0m

fused_adam ............. [93m[NO][0m ....... fused_adamfused_adam[92m[OKAY][0m fused_adam
.............   .............fused_lamb............. [93m[NO][0m.............    [93m[NO][0m.......[93m[NO][0m[93m[NO][0m  .......  [92m[OKAY][0m ..............
[92m[OKAY][0m  [92m[OKAY][0m[92m[OKAY][0m
fused_lamb

 fused_lamb.............  fused_lamb.............[93m[NO][0m  [93m[NO][0m  ...........................  [93m[NO][0m [92m[OKAY][0m [92m[OKAY][0m
.......
 [92m[OKAY][0msparse_attn
 ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer sparse_attnsparse_attn............   sparse_attn........................[93m[NO][0m   ...................[93m[NO][0m   [93m[NO][0m[92m[OKAY][0m 
[93m[NO][0m.......   ..............stochastic_transformer [92m[OKAY][0m [92m[OKAY][0m 

[92m[OKAY][0m.
 transformer[93m[NO][0mtransformertransformer   ........................ .......   ............[93m[NO][0m[92m[OKAY][0m[93m[NO][0m  
[93m[NO][0m  ..............  .......[92m[OKAY][0m[92m[OKAY][0m 

[92m[OKAY][0m
stochastic_transformerstochastic_transformer stochastic_transformer  .. . [93m[NO][0m [93m[NO][0m  [93m[NO][0m..............   [92m[OKAY][0m[92m[OKAY][0m.......

 [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
----------------------------------------------------------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

utils .................. [92m[YES][0m ...... [92m[OKAY][0m
DeepSpeed C++/CUDA extension op report----------------------------------------------------------------------------------------------------


JIT compiled ops requires ninja--------------------------------------------------DeepSpeed C++/CUDA extension op report


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

--------------------------------------------------JIT compiled ops requires ninja--------------------------------------------------

JIT compiled ops requires ninja

DeepSpeed C++/CUDA extension op report
--------------------------------------------------
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninjaninjaninjaninja   .................. .................................... ..................  [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m


------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------
op name
op name op name ................ op name................ ................  installedinstalled................    ..installed..installed   compatible compatible..

..  ----------------------------------------------------------------------------------------------------compatiblecompatible


----------------------------------------------------------------------------------------------------

cpu_adamcpu_adam  .............................. cpu_adamcpu_adam  [92m[YES][0m [92m[YES][0m...............   ...........................[92m[YES][0m    [92m[OKAY][0m[92m[YES][0m[92m[OKAY][0m
 ......
 ......[92m[OKAY][0m 
[92m[OKAY][0m
fused_adam .............fused_adam [93m[NO][0m  .............fused_adamfused_adam.......   ............. .............[92m[OKAY][0m[93m[NO][0m  
 [93m[NO][0m[93m[NO][0m....... fused_lamb .......  ....... [92m[OKAY][0m............. [92m[OKAY][0m 
[92m[OKAY][0m
[93m[NO][0m
 fused_lamb....... fused_lambfused_lamb  ............. [92m[OKAY][0m............. .............[93m[NO][0m
   [93m[NO][0m[93m[NO][0m.......  ....... ....... [92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m

sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............sparse_attntransformersparse_attn    [93m[NO][0m........................ ............ .......  [93m[NO][0m [93m[NO][0m[93m[NO][0m [92m[OKAY][0m  ..............
 ....... [92m[OKAY][0m 
transformer[92m[OKAY][0m[92m[OKAY][0m 

............transformer  stochastic_transformer[93m[NO][0m............ transformer   [93m[NO][0m....................    [92m[OKAY][0m.......[93m[NO][0m[93m[NO][0m 
  [92m[OKAY][0m..............
stochastic_transformer   [92m[OKAY][0m[92m[OKAY][0mstochastic_transformer.

  [93m[NO][0m .stochastic_transformer.......   [93m[NO][0m[92m[OKAY][0m .
.......  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report

--------------------------------------------------
--------------------------------------------------
----------------------------------------------------------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------
----------------------------------------------------------------------------------------------------

--------------------------------------------------
JIT compiled ops requires ninja
JIT compiled ops requires ninjaJIT compiled ops requires ninja

JIT compiled ops requires ninja

----------------------------------------------------------------------------------------------------

--------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report


------------------------------------------------------------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report--------------------------------------------------


------------------------------------------------------------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninjaJIT compiled ops requires ninja--------------------------------------------------


--------------------------------------------------JIT compiled ops requires ninja

JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
----------------------------------------------------------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report
--------------------------------------------------

----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------DeepSpeed C++/CUDA extension op report


----------------------------------------------------------------------------------------------------JIT compiled ops requires ninja--------------------------------------------------


JIT compiled ops requires ninja
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
JIT compiled ops requires ninja

--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report

------------------------------------------------------------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja


DeepSpeed C++/CUDA extension op report--------------------------------------------------

JIT compiled ops requires ninja--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report


--------------------------------------------------DeepSpeed C++/CUDA extension op report--------------------------------------------------


torch version .................... 1.8.1
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------


JIT compiled ops requires ninja----------------------------------------------------------------------------------------------------
JIT compiled ops requires ninja


JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report

--------------------------------------------------
torch cuda version ............... 11.1
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io async_io...............  ...............[93m[NO][0m  [93m[NO][0m.......  .......[93m[NO][0m 
[93m[NO][0m
transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utils .................. utils[92m[YES][0m  ........................  [92m[YES][0m[92m[OKAY][0m 
...... [92m[OKAY][0m
quantizer ..............quantizer  [93m[NO][0m..............  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
----------------------------------------------------------------------------------------------------
DeepSpeed C++/CUDA extension op report

DeepSpeed C++/CUDA extension op report
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------


----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report


JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja
--------------------------------------------------


--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

--------------------------------------------------JIT compiled ops requires ninja

JIT compiled ops requires ninja
ninjaninjaninjaninja    ........................................................................   [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m


------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------op name
op name 
op name................  op name................ ................  installed installed ................installed ..  .. installed..compatible   
compatible..compatible--------------------------------------------------
 

--------------------------------------------------compatible--------------------------------------------------


--------------------------------------------------
cpu_adam cpu_adam...............cpu_adam   cpu_adam...............[92m[YES][0m...............    ...............[92m[YES][0m......[92m[YES][0m    [92m[YES][0m......[92m[OKAY][0m ...... 
[92m[OKAY][0m...... 
 [92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
----------------------------------------------------------------------------------------------------
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


fused_adam fused_adam.............fused_adam fused_adam [93m[NO][0m  .............  .................................[93m[NO][0m    [93m[NO][0m[92m[OKAY][0m[93m[NO][0m.......  
DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report


--------------------------------------------------JIT compiled ops requires ninja

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------


DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja--------------------------------------------------


--------------------------------------------------
JIT compiled ops requires ninja
NOTE: Ops not installed will be just-in-time (JIT) compiled at
....... ....... fused_lamb [92m[OKAY][0m[92m[OKAY][0m [92m[OKAY][0m
.............

      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
 [93m[NO][0m fused_lamb.......fused_lamb fused_lamb  ............. .............[92m[OKAY][0m ............. 
[93m[NO][0m [93m[NO][0m [93m[NO][0m ....... ....... ....... [92m[OKAY][0m 
[92m[OKAY][0m[92m[OKAY][0m

ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... sparse_attn[92m[OKAY][0msparse_attn
[92m[OKAY][0m--------------------------------------------------


op name------------------------------------------------------------------------------------------------------------------------------------------------------ 


................op name op nameop nameinstalled   ................ .................. ................  installedcompatibleinstalled  
 ..installed-------------------------------------------------- .. 
compatible ..
sparse_attn   ........................transformer............    ............[93m[NO][0m[93m[NO][0m[93m[NO][0m  [93m[NO][0m.......  .......  ....... .......[92m[OKAY][0m  [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


compatible
 --------------------------------------------------compatible--------------------------------------------------


--------------------------------------------------
transformertransformer transformerstochastic_transformer............    .........................[93m[NO][0m    [93m[NO][0m[93m[NO][0m[93m[NO][0m .......   ..................... [92m[OKAY][0m  [92m[OKAY][0m
[92m[OKAY][0m[92m[OKAY][0m


cpu_adam ............... [92m[YES][0m ...... cpu_adamcpu_adam[92m[OKAY][0mcpu_adam  
............... ............... ............... [92m[YES][0m [92m[YES][0m [92m[YES][0m ...... ...... ...... [92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m

stochastic_transformerstochastic_transformerstochastic_transformer   ...   [93m[NO][0m[93m[NO][0m[93m[NO][0m   .....................   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0mfused_adam
fused_adam fused_adam ............. .............fused_lamb ............. [93m[NO][0m  [93m[NO][0m .............[93m[NO][0m ....... ....... [93m[NO][0m ....... [92m[OKAY][0m [92m[OKAY][0m .......

[92m[OKAY][0m [92m[OKAY][0m

fused_lambfused_lamb  ..........................fused_lamb   [93m[NO][0m[93m[NO][0m.............   ..............[93m[NO][0m   [92m[OKAY][0m[92m[OKAY][0m.......

 [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn transformersparse_attn ............ ........................sparse_attn    [93m[NO][0m[93m[NO][0m[93m[NO][0m ...................   .......[92m[OKAY][0m ....... 
[92m[OKAY][0m [93m[NO][0m
[92m[OKAY][0m 
....... stochastic_transformer[92m[OKAY][0mtransformertransformer 
  ......................... transformer   [93m[NO][0m[93m[NO][0m[93m[NO][0m ............  ..............  [93m[NO][0m....... [92m[OKAY][0m  [92m[OKAY][0m
.......[92m[OKAY][0m
 
[92m[OKAY][0m
stochastic_transformerstochastic_transformer stochastic_transformer  .. . [93m[NO][0m [93m[NO][0m [93m[NO][0m ....... ....... ....... [92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m

ninjaninjaninjaninja   ....................................   ....................................[92m[OKAY][0m[92m[OKAY][0m  
[92m[OKAY][0m
[92m[OKAY][0m--------------------------------------------------


----------------------------------------------------------------------------------------------------
--------------------------------------------------op name
op name
  op name................................op name   ................installed installed  ................ ..installed ..  compatibleinstalled.. 
  compatible..--------------------------------------------------compatible
 --------------------------------------------------

compatible

--------------------------------------------------
--------------------------------------------------
cpu_adam ............... cpu_adam[92m[YES][0m  cpu_adam.....................  cpu_adam ...............[92m[OKAY][0m [92m[YES][0m
  .....................[92m[YES][0m  [92m[OKAY][0m [92m[YES][0m
......fused_adam   ......[92m[OKAY][0m............. 
 [92m[OKAY][0m[93m[NO][0mfused_adam 
.......  .............[92m[OKAY][0m 
fused_adam[93m[NO][0m  fused_lamb ................................. fused_adam   [92m[OKAY][0m[93m[NO][0m.............[93m[NO][0m  
[93m[NO][0m ....... ....... .......fused_lamb [92m[OKAY][0m[92m[OKAY][0m 
 
.............[92m[OKAY][0m 
[93m[NO][0mfused_lamb fused_lamb ....... ............. ............. [92m[OKAY][0m [93m[NO][0m
[93m[NO][0msparse_attn   ..........................   [92m[OKAY][0m[92m[OKAY][0m
sparse_attn[93m[NO][0m
  ...................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
transformer ............transformer sparse_attn sparse_attn[93m[NO][0m   ...........................................    [93m[NO][0m[92m[OKAY][0m[93m[NO][0m[93m[NO][0m 
 .......  .......stochastic_transformer[92m[OKAY][0m ....... [92m[OKAY][0m
 [92m[OKAY][0m.
 transformer
[93m[NO][0m transformer ................... stochastic_transformer  [93m[NO][0m .............  [92m[OKAY][0m [93m[NO][0m.......
[93m[NO][0m   .......[92m[OKAY][0m....... 
 [92m[OKAY][0m
[92m[OKAY][0m
stochastic_transformer .stochastic_transformer  [93m[NO][0m ........  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m
[92m[OKAY][0m
----------------------------------------------------------------------------------------------------


--------------------------------------------------op nameop name-------------------------------------------------- 
 
................................ op name op nameinstalledinstalled    ....................  ................ compatibleinstalled
 compatible --------------------------------------------------
installed
..--------------------------------------------------  
..compatible 
compatible--------------------------------------------------

--------------------------------------------------cpu_adam
 ...............cpu_adam  [92m[YES][0m...............  ......[92m[YES][0mcpu_adam   ......[92m[OKAY][0m ...............
[92m[OKAY][0mcpu_adam
  [92m[YES][0m...............  ......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0mfused_adam
 fused_adam.............  .............[93m[NO][0m  [93m[NO][0m....... fused_adam .......[92m[OKAY][0mfused_adam   
..........................[92m[OKAY][0m 
 [93m[NO][0m[93m[NO][0mfused_lamb   ..............fused_lamb.............    .............[92m[OKAY][0m[93m[NO][0m[92m[OKAY][0m 
 
[93m[NO][0m.......  .......[92m[OKAY][0mfused_lambfused_lamb 
 [92m[OKAY][0m .............
.............  [93m[NO][0m ....... [92m[OKAY][0m
[93m[NO][0m ....... [92m[OKAY][0msparse_attn
 sparse_attn............  ............[93m[NO][0m  [93m[NO][0msparse_attn.......   ....... ............[92m[OKAY][0m[92m[OKAY][0m
 sparse_attn
[93m[NO][0m  ............transformer....... transformer  [93m[NO][0m [92m[OKAY][0m............ ............
 ....... [93m[NO][0m [93m[NO][0m transformer.......[92m[OKAY][0m   
.......[92m[OKAY][0m............ 
transformer[92m[OKAY][0m  
[93m[NO][0m............  .......stochastic_transformer[93m[NO][0m   [92m[OKAY][0mstochastic_transformer
........   [93m[NO][0m[92m[OKAY][0m .stochastic_transformer.......
   [93m[NO][0m[92m[OKAY][0m 
........stochastic_transformer   [92m[OKAY][0m[93m[NO][0m
.  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


DeepSpeed C++/CUDA extension op report------------------------------------------------------------------------------------------------------------------------------------------------------


JIT compiled ops requires ninjaJIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

--------------------------------------------------JIT compiled ops requires ninja

JIT compiled ops requires ninja
ninjaninjaninjaninja    ........................................................................   [92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m[92m[OKAY][0m
--------------------------------------------------


----------------------------------------------------------------------------------------------------
--------------------------------------------------
op name
 op nameop name................ op name  ................................installed    installed................ installed..  installed ....  compatible compatible..
 compatible
--------------------------------------------------compatible


----------------------------------------------------------------------------------------------------
--------------------------------------------------

cpu_adam ...............cpu_adam cpu_adam [92m[YES][0mcpu_adam  ............... ....................................   [92m[YES][0m [92m[OKAY][0m [92m[YES][0m[92m[YES][0m...... 
......   ......[92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m

fused_adam ............. [93m[NO][0m ....... fused_adam[92m[OKAY][0m 
.............fused_adamfused_adam   [93m[NO][0m..........................fused_lamb    ....................[93m[NO][0m[93m[NO][0m    [93m[NO][0m[92m[OKAY][0m..............   
[92m[OKAY][0m[92m[OKAY][0m.......

 [92m[OKAY][0mfused_lamb
 .............fused_lambfused_lamb  [93m[NO][0m ............. ............. ....... [93m[NO][0m [93m[NO][0m [92m[OKAY][0msparse_attn .......
 .......  ............[92m[OKAY][0m[92m[OKAY][0m 

[93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............transformer  [93m[NO][0m............  .......sparse_attn [93m[NO][0m  [92m[OKAY][0msparse_attn...................
   [93m[NO][0m[92m[OKAY][0m............ 
transformer ....... [93m[NO][0m stochastic_transformer [92m[OKAY][0m .......
............ . [92m[OKAY][0m [93m[NO][0m
 [93m[NO][0m....... transformertransformer [92m[OKAY][0m.......  
 ........................ [92m[OKAY][0m [93m[NO][0mstochastic_transformer
  [93m[NO][0m.......  ........[92m[OKAY][0m  
[93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0mstochastic_transformer
 stochastic_transformer .. [93m[NO][0m  [93m[NO][0m.......  [92m[OKAY][0m.......
 [92m[OKAY][0m
------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report


DeepSpeed C++/CUDA extension op report----------------------------------------------------------------------------------------------------
DeepSpeed C++/CUDA extension op report

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
----------------------------------------------------------------------------------------------------
--------------------------------------------------
--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


JIT compiled ops requires ninja----------------------------------------------------------------------------------------------------


JIT compiled ops requires ninjaJIT compiled ops requires ninja

ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------


op nameop nameop name  op name ................  ................................................installed    installedinstalledinstalled..   .. ....compatible  
compatiblecompatible-------------------------------------------------- 


compatible----------------------------------------------------------------------------------------------------


--------------------------------------------------
cpu_adam ...............cpu_adam cpu_adam[92m[YES][0m  ...............  ......cpu_adam ...............[92m[YES][0m [92m[OKAY][0m  
.....................[92m[YES][0m   [92m[OKAY][0m......
[92m[YES][0m  [92m[OKAY][0m......fused_adam
  [92m[OKAY][0m............. 
[93m[NO][0mfused_adam  ....................  [92m[OKAY][0m[93m[NO][0m
fused_adam .......  fused_lamb[92m[OKAY][0m............. 
 fused_adam.............[93m[NO][0mfused_lamb    [93m[NO][0m.................................   .......[93m[NO][0m   [93m[NO][0m[92m[OKAY][0m.......[92m[OKAY][0m 
 
.......[92m[OKAY][0m 
[92m[OKAY][0mfused_lamb
 ............. fused_lamb[93m[NO][0m  ....................  [93m[NO][0m[92m[OKAY][0m 
sparse_attn....... sparse_attn ............ [92m[OKAY][0m............ [93m[NO][0m
  [93m[NO][0m.......  .......[92m[OKAY][0m 
sparse_attn[92m[OKAY][0m 
transformer............  ............transformer[93m[NO][0m sparse_attn   [93m[NO][0m...................  ............[93m[NO][0m .......   [92m[OKAY][0m[93m[NO][0m[92m[OKAY][0m....... 

 .......[92m[OKAY][0m transformer[92m[OKAY][0m
 
stochastic_transformer............  stochastic_transformer[93m[NO][0mtransformer.    ....................[93m[NO][0m  [93m[NO][0m  [92m[OKAY][0m .......[93m[NO][0m
.......   [92m[OKAY][0m.......[92m[OKAY][0m
stochastic_transformer 
 [92m[OKAY][0m
. [93m[NO][0m stochastic_transformer.......  [92m[OKAY][0m
. [93m[NO][0m ....... [92m[OKAY][0m
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


----------------------------------------------------------------------------------------------------
--------------------------------------------------
--------------------------------------------------
op name
op name op name op name................ ................  ................installed installed................   installed .. installed..   ..compatiblecompatible..
 
 --------------------------------------------------compatible
--------------------------------------------------compatible


----------------------------------------------------------------------------------------------------

cpu_adam cpu_adam...............  ...............[92m[YES][0m  [92m[YES][0mcpu_adam......cpu_adam    .....................[92m[OKAY][0m...............  [92m[OKAY][0m 
[92m[YES][0m
 [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
fused_adamfused_adam  ..........................  [93m[NO][0m[93m[NO][0m  fused_adam..............   [92m[OKAY][0m[92m[OKAY][0mfused_adam.............
 
 fused_lamb.............[93m[NO][0m   fused_lamb[93m[NO][0m.............   ....................[93m[NO][0m.......   [93m[NO][0m  [92m[OKAY][0m.......[92m[OKAY][0m 
.......[92m[OKAY][0m 

fused_lamb[92m[OKAY][0m
 fused_lamb.............  .............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0msparse_attn
 sparse_attn............  ............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
sparse_attn transformersparse_attntransformer ............  ........................  ............[93m[NO][0m  [93m[NO][0m [93m[NO][0m [93m[NO][0m  ............................  [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m 


[92m[OKAY][0mstochastic_transformertransformer
 stochastic_transformer  ............. transformer. [93m[NO][0m  [93m[NO][0m ............[93m[NO][0m  ....... ..............  [93m[NO][0m[92m[OKAY][0m
 [92m[OKAY][0m [92m[OKAY][0m
.......
 [92m[OKAY][0m
stochastic_transformerstochastic_transformer  ..  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

ninjaninjaninjaninja    ...................................................... ..................  [92m[OKAY][0m[92m[OKAY][0m [92m[OKAY][0m

[92m[OKAY][0m
--------------------------------------------------
----------------------------------------------------------------------------------------------------


--------------------------------------------------op nameop name
 op name ................ ................op name ................   installedinstalledinstalled................   .. ..  installedcompatiblecompatible..  
..
 compatible----------------------------------------------------------------------------------------------------compatible


----------------------------------------------------------------------------------------------------

cpu_adam cpu_adam...............  ...............[92m[YES][0mcpu_adam   [92m[YES][0m.....................cpu_adam   ............... [92m[OKAY][0m......  [92m[YES][0m
[92m[YES][0m [92m[OKAY][0m ......
......  [92m[OKAY][0m
[92m[OKAY][0m
fused_adam ............. fused_adam[93m[NO][0m  .......fused_adam ............. [92m[OKAY][0m .............fused_adam
[93m[NO][0m   [93m[NO][0m.............fused_lamb.......   .......[93m[NO][0m .............  [92m[OKAY][0m[92m[OKAY][0m .......

[93m[NO][0m  [92m[OKAY][0m.......fused_lambfused_lamb
   .............[92m[OKAY][0mfused_lamb............. 
 [93m[NO][0m .............[93m[NO][0m   ..............[93m[NO][0m   [92m[OKAY][0m[92m[OKAY][0m.......
sparse_attn
 ............  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
transformer sparse_attn............sparse_attn   [93m[NO][0m........................   .......[93m[NO][0m[93m[NO][0m sparse_attn  [92m[OKAY][0m..............
   ............[92m[OKAY][0m[92m[OKAY][0m 
[93m[NO][0m
stochastic_transformertransformer   .......transformer.............  [92m[OKAY][0m  [93m[NO][0m
............[93m[NO][0m   transformer.......[93m[NO][0m.......   [92m[OKAY][0m....... 
[92m[OKAY][0m............  
[92m[OKAY][0mstochastic_transformer[93m[NO][0m
  ....... .[92m[OKAY][0m stochastic_transformer
[93m[NO][0m  ........  [92m[OKAY][0m[93m[NO][0m
 stochastic_transformer.......  [92m[OKAY][0m
. [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninjaninjaninjaninja   .................................... .................. ..................  [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m 


[92m[OKAY][0m----------------------------------------------------------------------------------------------------
--------------------------------------------------


--------------------------------------------------op name
op nameop nameop name   ................ ................ ................................ installed  installed installedinstalled ..   ....compatible..
DeepSpeed general environment info:
  compatible --------------------------------------------------
compatiblecompatible
--------------------------------------------------


----------------------------------------------------------------------------------------------------

cpu_adam ............... [92m[YES][0mcpu_adam cpu_adamcpu_adam ......  ............... ..............................[92m[OKAY][0m 
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
  [92m[YES][0m[92m[YES][0m[92m[YES][0m   ..................   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
torch version .................... 1.8.1
fused_adamfused_adamfused_lamb   .............fused_adam..........................    .............[93m[NO][0m[93m[NO][0m[93m[NO][0m   ....... [93m[NO][0m..............    [92m[OKAY][0m[92m[OKAY][0m.......[92m[OKAY][0m


torch cuda version ............... 11.1
 [92m[OKAY][0m
nvcc version ..................... 11.2
fused_lamb fused_lamb.............  .............[93m[NO][0m fused_lamb [93m[NO][0m ....... ............. ....... [92m[OKAY][0m [93m[NO][0msparse_attn[92m[OKAY][0m
  
...................  [93m[NO][0m[92m[OKAY][0m .......
 [92m[OKAY][0m
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
transformer sparse_attnsparse_attn............   ........................ [93m[NO][0m [93m[NO][0m sparse_attn[93m[NO][0m .......   ..........................[92m[OKAY][0m   
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[92m[OKAY][0m[92m[OKAY][0m[93m[NO][0m

 ....... stochastic_transformertransformer[92m[OKAY][0mtransformer  
 ............ .............[93m[NO][0mtransformer   ....... [93m[NO][0m............ [93m[NO][0m   [92m[OKAY][0m..............[93m[NO][0m
   [92m[OKAY][0m.......[92m[OKAY][0m
 
stochastic_transformer[92m[OKAY][0m 
stochastic_transformer.  [93m[NO][0mstochastic_transformer . .......  .[92m[OKAY][0m[93m[NO][0m 
 [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninja----------------------------------------------------------------------------------------------------
DeepSpeed C++/CUDA extension op report


DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report--------------------------------------------------


----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------


----------------------------------------------------------------------------------------------------JIT compiled ops requires ninja


JIT compiled ops requires ninjaJIT compiled ops requires ninja

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
----------------------------------------------------------------------------------------------------
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed C++/CUDA extension op report

DeepSpeed C++/CUDA extension op report--------------------------------------------------

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------


DeepSpeed C++/CUDA extension op report--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


JIT compiled ops requires ninja----------------------------------------------------------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja

--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja  ....................................   .................. ..................[92m[OKAY][0m 
[92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m--------------------------------------------------
--------------------------------------------------


----------------------------------------------------------------------------------------------------op nameop name

  ................op name................op name    installed................installed................    ....installedinstalled   compatible ..
compatible.. --------------------------------------------------compatible 


compatible--------------------------------------------------

----------------------------------------------------------------------------------------------------

cpu_adam ............... cpu_adamcpu_adamcpu_adam[92m[YES][0m    ...................................................    [92m[YES][0m[92m[YES][0m[92m[YES][0m[92m[OKAY][0m   
..................   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


fused_adam .............fused_adamfused_adamfused_adam    .............[93m[NO][0m..........................    .......[93m[NO][0m[93m[NO][0m[93m[NO][0m .......  [92m[OKAY][0m  .......
[92m[OKAY][0m.......  [92m[OKAY][0m
[92m[OKAY][0m

fused_lamb fused_lambfused_lamb............. fused_lamb  ............. .............[93m[NO][0m .............   [93m[NO][0m[93m[NO][0m.......[93m[NO][0m    .....................[92m[OKAY][0m   
[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


sparse_attnsparse_attnsparse_attnsparse_attn    ................................................    [93m[NO][0m[93m[NO][0m[93m[NO][0m[93m[NO][0m   .............. ..............    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


transformertransformertransformertransformer    ................................................    [93m[NO][0m[93m[NO][0m[93m[NO][0m[93m[NO][0m  .......   ..............  .......[92m[OKAY][0m[92m[OKAY][0m [92m[OKAY][0m

[92m[OKAY][0m

stochastic_transformerstochastic_transformerstochastic_transformer  stochastic_transformer  ....    [93m[NO][0m[93m[NO][0m[93m[NO][0m[93m[NO][0m    ............................   [92m[OKAY][0m[92m[OKAY][0m [92m[OKAY][0m

[92m[OKAY][0m

--------------------------------------------------
DeepSpeed C++/CUDA extension op report--------------------------------------------------

--------------------------------------------------DeepSpeed C++/CUDA extension op report

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------JIT compiled ops requires ninja

JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
DeepSpeed C++/CUDA extension op report

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------


--------------------------------------------------
JIT compiled ops requires ninja--------------------------------------------------JIT compiled ops requires ninja


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
------------------------------------------------------------------------------------------------------------------------------------------------------


DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report--------------------------------------------------
DeepSpeed C++/CUDA extension op report


DeepSpeed C++/CUDA extension op report------------------------------------------------------------------------------------------------------------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja----------------------------------------------------------------------------------------------------


--------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja


JIT compiled ops requires ninja
ninjaninjaninja  ninja ......................................................   [92m[OKAY][0m ..................
[92m[OKAY][0m[92m[OKAY][0m 
--------------------------------------------------[92m[OKAY][0m

----------------------------------------------------------------------------------------------------


op name--------------------------------------------------op name op name 
................................  op name  ................installed................ installed  installed ..installed..    compatible....
compatible  --------------------------------------------------
compatiblecompatible
--------------------------------------------------


----------------------------------------------------------------------------------------------------

cpu_adamcpu_adam  ...............cpu_adam...............  cpu_adam...............  [92m[YES][0m[92m[YES][0m  [92m[YES][0m .....................   [92m[YES][0m......[92m[OKAY][0m ...... 
......[92m[OKAY][0m  
[92m[OKAY][0m[92m[OKAY][0m

fused_adam ............. fused_adam[93m[NO][0m  fused_adam.............fused_adam.......    .............[93m[NO][0m.............[92m[OKAY][0m   
.......[93m[NO][0m[93m[NO][0m   [92m[OKAY][0m..............
 fused_lamb [92m[OKAY][0m 
fused_lamb[92m[OKAY][0m............. 
 .............fused_lamb [93m[NO][0m fused_lamb[93m[NO][0m  ............. ...........................    [93m[NO][0m[92m[OKAY][0m[92m[OKAY][0m [93m[NO][0m

 ..............  [92m[OKAY][0m[92m[OKAY][0m

sparse_attnsparse_attn  ........................  [93m[NO][0msparse_attn [93m[NO][0m....... sparse_attn .......  [92m[OKAY][0m ........................ 
[92m[OKAY][0m [93m[NO][0m
[93m[NO][0m transformer .......  .......[92m[OKAY][0mtransformer............  
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
 [92m[OKAY][0m[93m[NO][0m
 ............transformer.......  transformer [93m[NO][0m............  [92m[OKAY][0m ...................
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io [93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`................ [93m[NO][0m[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
 ....... 
[93m[NO][0m
[93m[NO][0m   [93m[NO][0m[92m[OKAY][0mstochastic_transformer ....... 
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0masync_io
 async_io...............  [93m[NO][0m...............  .......[93m[NO][0m  utils[93m[NO][0m....... 
 ..................[93m[NO][0m 
....... .[92m[OKAY][0m  stochastic_transformer
[92m[OKAY][0m [93m[NO][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
[92m[YES][0m ...... [92m[OKAY][0m
quantizer transformer_inference..............  ..[93m[NO][0m  transformer_inference[93m[NO][0m.......   .........[92m[OKAY][0m  
 ........stochastic_transformer  stochastic_transformer [92m[OKAY][0m[93m[NO][0m  
.........   [93m[NO][0m[93m[NO][0m[92m[OKAY][0m  
async_io ............... [93m[NO][0m ....... [93m[NO][0m
[93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m--------------------------------------------------

..............  [92m[OKAY][0m[92m[OKAY][0m

transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils ..................utils  [92m[YES][0m..................  ......[92m[YES][0m  [92m[OKAY][0m......
transformer_inference .. [93m[NO][0mutils  .........................  [92m[OKAY][0m[92m[YES][0m
 [92m[OKAY][0m
 ...... [92m[OKAY][0m
quantizer ..............quantizer  [93m[NO][0m..............  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
utilsquantizer  ................................  [92m[YES][0m[93m[NO][0m  .............  [92m[OKAY][0m[92m[OKAY][0m

quantizer ..............-------------------------------------------------- 
[93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninjaninja  .................................... [92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
op name op name................  ................installed  installed..  ..compatible 
compatible--------------------------------------------------

--------------------------------------------------
cpu_adam cpu_adam...............  [92m[YES][0m............... ......  [92m[OKAY][0m[92m[YES][0m
 ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m fused_adam.......  [92m[OKAY][0m.............
 [93m[NO][0m fused_lamb.......  ............. [92m[OKAY][0m[93m[NO][0m 
....... [92m[OKAY][0mfused_lamb
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
 ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
sparse_attn transformer............ ............  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformerstochastic_transformer  ............ .[93m[NO][0m  [93m[NO][0m ..............  [92m[OKAY][0m[92m[OKAY][0m

utils .................. [92m[YES][0m ...... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------


DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report


------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

DeepSpeed C++/CUDA extension op report

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------
----------------------------------------------------------------------------------------------------
--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report


JIT compiled ops requires ninjaJIT compiled ops requires ninja----------------------------------------------------------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninjaJIT compiled ops requires ninja

JIT compiled ops requires ninjaJIT compiled ops requires ninja--------------------------------------------------JIT compiled ops requires ninja


JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
DeepSpeed general environment info:
async_io ............... [93m[NO][0m ....... [93m[NO][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m[92m[OKAY][0m
--------------------------------------------------

transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
torch version .................... 1.8.1

------------------------------------------------------------------------------------------------------------------------------------------------------

op name
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
torch cuda version ............... 11.1
op nameop nameop name   ................ ................................................   installedinstalled   installed..installed..    ..compatiblecompatible.. 

compatible 
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
nvcc version ..................... 11.2
----------------------------------------------------------------------------------------------------
compatible
--------------------------------------------------

--------------------------------------------------
--------------------------------------------------
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
cpu_adam cpu_adam............... cpu_adam cpu_adam...............[92m[YES][0m    .....................[92m[YES][0m   [92m[OKAY][0m...............[92m[YES][0m...... [92m[YES][0m
   ......[92m[OKAY][0m...... 
 [92m[OKAY][0m[92m[OKAY][0m

deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
fused_adam ............. [93m[NO][0m ....... fused_adam[92m[OKAY][0m fused_adam
.............fused_adam   [93m[NO][0m.......................... fused_lamb  .......[93m[NO][0m ............. [93m[NO][0m[92m[OKAY][0m   
.......[93m[NO][0m.......   fused_lamb.......[92m[OKAY][0m[92m[OKAY][0m  
.............
[92m[OKAY][0m 
[93m[NO][0mfused_lambfused_lamb   .................... ............. [92m[OKAY][0m [93m[NO][0m
[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
sparse_attn ............ transformer[93m[NO][0msparse_attn sparse_attn   ...........................................   [93m[NO][0m [93m[NO][0m[92m[OKAY][0m [93m[NO][0m 
async_io ............... [93m[NO][0m ....... [93m[NO][0m
....... ....... .......transformer [92m[OKAY][0m  
............[92m[OKAY][0m [92m[OKAY][0m
[93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
 stochastic_transformer....... transformer transformer [92m[OKAY][0m .........................  [93m[NO][0m
  [93m[NO][0m.......[93m[NO][0m  stochastic_transformer[92m[OKAY][0m ....... 
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
.......  [92m[OKAY][0m[92m[OKAY][0m
.
 [93m[NO][0m .......stochastic_transformer  [92m[OKAY][0mstochastic_transformer
.  [93m[NO][0m.  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninjaninjaninjaninja   ......................................................    [92m[OKAY][0m[92m[OKAY][0m..................[92m[OKAY][0m

 
--------------------------------------------------[92m[OKAY][0m--------------------------------------------------

--------------------------------------------------
op name
-------------------------------------------------- op nameop name................ 
  ................................installedop name   installedinstalled ..  .................. ..   compatibleinstalledcompatiblecompatible

 
------------------------------------------------------------------------------------------------------------------------------------------------------..


 compatible
--------------------------------------------------
cpu_adam cpu_adamcpu_adam...............   ..............................cpu_adam[92m[YES][0m    [92m[YES][0m[92m[YES][0m............... ......  ...... [92m[YES][0m ......[92m[OKAY][0m  [92m[OKAY][0m
......[92m[OKAY][0m
 
[92m[OKAY][0m
fused_adam .............fused_adam  fused_adam[93m[NO][0m ............. fused_adam............. .......  [93m[NO][0m [93m[NO][0m [92m[OKAY][0m.................... 
  .......[93m[NO][0m[92m[OKAY][0m  fused_lamb
[92m[OKAY][0m .......
............. fused_lamb [92m[OKAY][0m fused_lamb
[93m[NO][0m.............   fused_lamb....................[93m[NO][0m    .............[92m[OKAY][0m.......[93m[NO][0m
   [92m[OKAY][0m.......[93m[NO][0m 
 .......[92m[OKAY][0m 
[92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0msparse_attn
 ............ transformersparse_attn[93m[NO][0m sparse_attn  ............ ...............................   [93m[NO][0m[92m[OKAY][0m  [93m[NO][0m
.......[93m[NO][0m   .......[92m[OKAY][0m .......
transformer[92m[OKAY][0m  
[92m[OKAY][0m............
stochastic_transformer  transformertransformer[93m[NO][0m  . ................... ............ [93m[NO][0m  [93m[NO][0m [92m[OKAY][0m [93m[NO][0m..............
   .......[92m[OKAY][0m[92m[OKAY][0m
stochastic_transformer 
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
 [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
.stochastic_transformer  [93m[NO][0m.stochastic_transformer   .......[93m[NO][0m  ........[92m[OKAY][0m  
[93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

------------------------------------------------------------------------------------------------------------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------DeepSpeed C++/CUDA extension op report


--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja
JIT compiled ops requires ninja

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------

--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja

--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninjaninjaninja ninja  .................................... ..................  .................. [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


op nameop name op name op name................ ................  ................ ................installed   installedinstalledinstalled..    ....compatible..   
compatiblecompatiblecompatible
--------------------------------------------------

--------------------------------------------------
----------------------------------------------------------------------------------------------------


cpu_adam cpu_adamcpu_adamcpu_adam ...............  ..............................  ...............[92m[YES][0m [92m[YES][0m  [92m[YES][0m  ............[92m[YES][0m......  [92m[OKAY][0m [92m[OKAY][0m
...... 
 [92m[OKAY][0m[92m[OKAY][0m

fused_adamfused_adam ............. fused_adam............. fused_adam   .............[93m[NO][0m.............[93m[NO][0m    [93m[NO][0m.......[93m[NO][0m  ....... .......[92m[OKAY][0m....... 
  [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


fused_lamb fused_lamb.............fused_lamb fused_lamb.............   ............. [93m[NO][0m............. [93m[NO][0m [93m[NO][0m .......  [93m[NO][0m .............. [92m[OKAY][0m  .......
[92m[OKAY][0m[92m[OKAY][0m 

[92m[OKAY][0m
sparse_attnsparse_attnsparse_attn ............sparse_attn    ............[93m[NO][0m............ ............  [93m[NO][0m [93m[NO][0m.......  [93m[NO][0m .............. [92m[OKAY][0m  .......
[92m[OKAY][0m[92m[OKAY][0m
 
[92m[OKAY][0mtransformer
ninjaninjaninjaninja   .................. .................................... ..................   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------

op name
 transformer............transformer transformer  ........................ [93m[NO][0m  ............ [93m[NO][0m[93m[NO][0m ....... .......  [92m[OKAY][0m[93m[NO][0m .......
[92m[OKAY][0m  
[92m[OKAY][0m.......
 stochastic_transformer[92m[OKAY][0m stochastic_transformer
op name op name................ op name  ................ ................installed ................  installedinstalled..  installed   ..compatible.. 
.. compatible--------------------------------------------------
 compatible
--------------------------------------------------compatible


----------------------------------------------------------------------------------------------------

stochastic_transformer .  .stochastic_transformer[93m[NO][0m.    [93m[NO][0m....... [93m[NO][0m.  ....... [93m[NO][0m[92m[OKAY][0m ....... 
[92m[OKAY][0m....... 
cpu_adamcpu_adam  .............................. cpu_adam[92m[YES][0mcpu_adam   ............... ......[92m[YES][0m  ............... [92m[YES][0m......  [92m[OKAY][0m[92m[OKAY][0m [92m[YES][0m
......
[92m[OKAY][0m 
[92m[OKAY][0m
  ......[92m[OKAY][0m 
[92m[OKAY][0m
fused_adam ............. fused_adam[93m[NO][0m  ....................fused_adamfused_adam   [92m[OKAY][0m [93m[NO][0m.............
.............   .......[93m[NO][0m[93m[NO][0mfused_lamb   [92m[OKAY][0m .............
....... ....... fused_lamb[93m[NO][0m   [92m[OKAY][0m[92m[OKAY][0m.............
.......
  [93m[NO][0mfused_lamb[92m[OKAY][0m 
.......fused_lamb   ..........................[92m[OKAY][0m  
[93m[NO][0m[93m[NO][0m  ....... .......[92m[OKAY][0m 
[92m[OKAY][0msparse_attn
 ............ [93m[NO][0m ....... sparse_attn[92m[OKAY][0m 
............ [93m[NO][0mtransformer  .......sparse_attn............ sparse_attn [92m[OKAY][0m 
 [93m[NO][0m........................transformer    .......[93m[NO][0m............[93m[NO][0m    [93m[NO][0m[92m[OKAY][0m....... .......
 ....... [92m[OKAY][0m [92m[OKAY][0m
stochastic_transformer
[92m[OKAY][0m 
transformer. transformerstochastic_transformer ............   [93m[NO][0m[93m[NO][0m.............    ..............[93m[NO][0m[93m[NO][0m    [92m[OKAY][0m[92m[OKAY][0m.......
....... 
[92m[OKAY][0m 
[92m[OKAY][0mstochastic_transformer
 . [93m[NO][0mstochastic_transformer  ....... [92m[OKAY][0m.
 [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninjaninjaninjaninja   ......................................................    [92m[OKAY][0m..................
[92m[OKAY][0m[92m[OKAY][0m 

--------------------------------------------------[92m[OKAY][0m
----------------------------------------------------------------------------------------------------

op name
--------------------------------------------------op name op name
 ................ ................op name ................  installed installed................installed    ......installed  compatible  compatible
compatible..
 --------------------------------------------------
----------------------------------------------------------------------------------------------------compatible


--------------------------------------------------
cpu_adamcpu_adamcpu_adam   ...............cpu_adam............... ...............[92m[YES][0m   [92m[YES][0m ............... [92m[YES][0m............    [92m[YES][0m......[92m[OKAY][0m[92m[OKAY][0m  

......[92m[OKAY][0m 
[92m[OKAY][0m
fused_adamfused_adam fused_adam ............. fused_adam............. .............  [93m[NO][0m [93m[NO][0m.............  [93m[NO][0m ..............[93m[NO][0m    .......[92m[OKAY][0m[92m[OKAY][0m 
.......
 [92m[OKAY][0mfused_lamb[92m[OKAY][0m
 
fused_lamb.............  .............fused_lamb[93m[NO][0m fused_lamb  [93m[NO][0m ....... ............. .............[92m[OKAY][0m.......   
[93m[NO][0m[92m[OKAY][0m[93m[NO][0m 
 ..............  [92m[OKAY][0m[92m[OKAY][0m

DeepSpeed general environment info:
sparse_attn ............sparse_attn  [93m[NO][0msparse_attn ............sparse_attn.......   [93m[NO][0m ............  [92m[OKAY][0m............[93m[NO][0m.......
   [93m[NO][0m[92m[OKAY][0m....... 
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
transformer.......  [92m[OKAY][0m ............transformer
torch version .................... 1.8.1
 [92m[OKAY][0m [93m[NO][0mtransformer
torch cuda version ............... 11.1
............   ...................[93m[NO][0m transformer  [92m[OKAY][0m....... [93m[NO][0m ............[92m[OKAY][0m
  
.......[93m[NO][0m  stochastic_transformer[92m[OKAY][0m.......stochastic_transformer 
nvcc version ..................... 11.2
  .[92m[OKAY][0m. 
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
 [93m[NO][0mstochastic_transformer[93m[NO][0m   stochastic_transformer...............   [92m[OKAY][0m [92m[OKAY][0m

.[93m[NO][0m  .......[93m[NO][0m  [92m[OKAY][0m.......
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
async_io ............... [93m[NO][0m ....... [93m[NO][0m
 [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utils ..................utils  [92m[YES][0m..................  ......[92m[YES][0m  [92m[OKAY][0m
......quantizer  [92m[OKAY][0m..............
 [93m[NO][0m quantizer.......  ..............[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m--------------------------------------------------

--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0masync_io
 ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0mutils  .........................  [92m[YES][0m[92m[OKAY][0m 
...... [92m[OKAY][0m
utilsquantizer  ................................  [92m[YES][0m[93m[NO][0m  .............  [92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------quantizer
 .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io async_io...............  [93m[NO][0m...............  .......[93m[NO][0m  [93m[NO][0m.......
 [93m[NO][0m
transformer_inference ..transformer_inference  [93m[NO][0m..  .......[93m[NO][0m  .......[92m[OKAY][0m 
[92m[OKAY][0m
utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

quantizer ..............quantizer  [93m[NO][0m..............  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
------------------------------------------------------------------------------------------------------------------------------------------------------


DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report
----------------------------------------------------------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
/bin/sh: line 0: type: git: not found
--------------------------------------------------
----------------------------------------------------------------------------------------------------
--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report
JIT compiled ops requires ninja

--------------------------------------------------
--------------------------------------------------

JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------
JIT compiled ops requires ninja
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report--------------------------------------------------
--------------------------------------------------

DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

----------------------------------------------------------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------JIT compiled ops requires ninja----------------------------------------------------------------------------------------------------


JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninjaJIT compiled ops requires ninja

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ...............async_io [93m[NO][0m  ......................  [93m[NO][0m[93m[NO][0m 
....... [93m[NO][0m
transformer_inference transformer_inference..  ..[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
utils ..................utils  [92m[YES][0m..................  ......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
quantizer .............. quantizer[93m[NO][0m  .....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m[NO][0m

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils ..................async_io async_io [92m[YES][0m ..............................   ......[93m[NO][0m[93m[NO][0m   [92m[OKAY][0m..............
  [93m[NO][0m[93m[NO][0m

quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------transformer_inference
 transformer_inference..  ..[93m[NO][0m [93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

quantizer .............. quantizer[93m[NO][0m  .....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
--------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
----------------------------------------------------------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op report


DeepSpeed C++/CUDA extension op report----------------------------------------------------------------------------------------------------
DeepSpeed C++/CUDA extension op report

quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report
--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

----------------------------------------------------------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------
JIT compiled ops requires ninja

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
--------------------------------------------------
      meet the required dependencies to JIT install the op.
--------------------------------------------------JIT compiled ops requires ninja


--------------------------------------------------JIT compiled ops requires ninja

JIT compiled ops requires ninja
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


op nameop nameop name op name   ................................................................    installedinstalledinstalledinstalled   .. .... ..  compatible compatiblecompatiblecompatible


----------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------


cpu_adamcpu_adam  cpu_adam...............cpu_adam...............    [92m[YES][0m[92m[YES][0m..............................    ......[92m[YES][0m ......[92m[YES][0m ......  [92m[OKAY][0m[92m[OKAY][0m 
[92m[OKAY][0m......

 [92m[OKAY][0m
fused_adamfused_adam  fused_adam.............fused_adam.............    .............[93m[NO][0m............. [93m[NO][0m  .......[93m[NO][0m [93m[NO][0m ....... [92m[OKAY][0m  .......[92m[OKAY][0m
....... 
 [92m[OKAY][0m[92m[OKAY][0m
fused_lamb
 fused_lamb.............fused_lamb   [93m[NO][0m.............fused_lamb.............   ....... [93m[NO][0m............. [93m[NO][0m [92m[OKAY][0m....... 
  [93m[NO][0m.......[92m[OKAY][0m  
.......[92m[OKAY][0m 
[92m[OKAY][0m
sparse_attnsparse_attn  ........................sparse_attn  [93m[NO][0m sparse_attn[93m[NO][0m ............ .......   ...................[93m[NO][0m[92m[OKAY][0m  [93m[NO][0m 
[92m[OKAY][0m .......
....... transformer[92m[OKAY][0m  transformer
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[92m[OKAY][0m............ 
async_io ............... [93m[NO][0m ....... [93m[NO][0m
............ transformer[93m[NO][0m transformer [93m[NO][0m .......   ...................[92m[OKAY][0m............ 
  [92m[OKAY][0m[93m[NO][0m[93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
  ..............stochastic_transformer   [92m[OKAY][0mstochastic_transformer[92m[OKAY][0m
.  
[93m[NO][0m.stochastic_transformer   .......stochastic_transformer[93m[NO][0m .  [92m[OKAY][0m 
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
........[93m[NO][0m   [92m[OKAY][0m[93m[NO][0m.......
  .......[92m[OKAY][0m 
[92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m

--------------------------------------------------
----------------------------------------------------------------------------------------------------


--------------------------------------------------op nameop name op name
  ................................................  op name installedinstalled  installed ....................  ..installed   compatiblecompatible..

 --------------------------------------------------compatible--------------------------------------------------

 
compatible--------------------------------------------------

cpu_adam ............... cpu_adam[92m[YES][0m -------------------------------------------------- 
...............cpu_adam......   [92m[YES][0m...............[92m[OKAY][0m  
......[92m[YES][0m  ......[92m[OKAY][0m 
[92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... cpu_adam[92m[OKAY][0mfused_adam
 fused_adam.............   [93m[NO][0mfused_lamb.............   ...................................[93m[NO][0m   [92m[OKAY][0m[93m[NO][0m....... 
....... [92m[OKAY][0m
  fused_lamb[92m[YES][0m[92m[OKAY][0m sparse_attn
.............   ..................[93m[NO][0mfused_lamb    [93m[NO][0m....................   .......[92m[OKAY][0m[92m[OKAY][0m[93m[NO][0m 

 [92m[OKAY][0m.......
 [92m[OKAY][0m
transformer ............ [93m[NO][0m sparse_attn.......  ............[92m[OKAY][0m 
[93m[NO][0mfused_adam sparse_attn ....... stochastic_transformer ............ [92m[OKAY][0m .............
 .[93m[NO][0m[93m[NO][0m   .......transformer[93m[NO][0m  .......[92m[OKAY][0m............
 .......   transformer[92m[OKAY][0m[93m[NO][0m 
[92m[OKAY][0m 
................... [93m[NO][0m  fused_lamb[92m[OKAY][0m....... 
[92m[OKAY][0m 
.............stochastic_transformer  .stochastic_transformer[93m[NO][0m   .......[93m[NO][0m ........  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
 [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m .......async_io [93m[NO][0m 
............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... transformer_inference[92m[OKAY][0m
 .. [93m[NO][0m ....... [92m[OKAY][0mutils
 .................. [92m[YES][0m ...... [92m[OKAY][0mutils
 .................. [92m[YES][0m ......quantizer  [92m[OKAY][0m..............
 [93m[NO][0m .......quantizer  [92m[OKAY][0m..............
 [93m[NO][0m ....... --------------------------------------------------[92m[OKAY][0m

--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inference .. transformer_inference[93m[NO][0m  .........  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
utils .................. utils[92m[YES][0m  ........................  [92m[YES][0m[92m[OKAY][0m 
...... [92m[OKAY][0m
quantizer .............. quantizer[93m[NO][0m  .....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
async_io ............... [93m[NO][0m ....... [93m[NO][0m
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inferenceutils  ....................  [93m[NO][0m[92m[YES][0m  .............  [92m[OKAY][0m[92m[OKAY][0m

quantizer utils..............  ..................[93m[NO][0m [92m[YES][0m  .............  [92m[OKAY][0m[92m[OKAY][0m

quantizer ..............-------------------------------------------------- 
[93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report


DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report


--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.----------------------------------------------------------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------

--------------------------------------------------
JIT compiled ops requires ninja--------------------------------------------------

JIT compiled ops requires ninja

JIT compiled ops requires ninjaJIT compiled ops requires ninja

ninjaninjaninjaninja   ..................  ....................................[92m[OKAY][0m..................  
[92m[OKAY][0m-------------------------------------------------- [92m[OKAY][0m

[92m[OKAY][0m
--------------------------------------------------op name
-------------------------------------------------- 
................
--------------------------------------------------op nameop name 
 installed ................ op name................ ..  installed ................installed compatible  ..
installed..--------------------------------------------------   
compatible..compatible
 
--------------------------------------------------compatible--------------------------------------------------


cpu_adam-------------------------------------------------- 
............... [92m[YES][0mcpu_adam cpu_adam ...... cpu_adam .............................. [92m[OKAY][0m  ...............
[92m[YES][0m[92m[YES][0m  [92m[YES][0m ............   [92m[OKAY][0m......[92m[OKAY][0mfused_adam 

[92m[OKAY][0m .............
 [93m[NO][0m ....... [92m[OKAY][0mfused_adamfused_adam
  fused_adam..........................fused_lamb    .............[93m[NO][0m[93m[NO][0m.............    .......[93m[NO][0m....... [93m[NO][0m[92m[OKAY][0m   
.......[92m[OKAY][0m.......  fused_lamb
[92m[OKAY][0m[92m[OKAY][0m 

.............fused_lamb  .............[93m[NO][0mfused_lamb   [93m[NO][0m....... .............sparse_attn.......    [92m[OKAY][0m[93m[NO][0m............[92m[OKAY][0m
 
 [93m[NO][0m.......  ....... [92m[OKAY][0m[92m[OKAY][0m

sparse_attn ............transformer  sparse_attn[93m[NO][0m............   ...................[93m[NO][0msparse_attn    [92m[OKAY][0m.......[93m[NO][0m............
   [93m[NO][0m[92m[OKAY][0m.......
 transformer ....... [92m[OKAY][0m ............stochastic_transformer
[92m[OKAY][0m  
[93m[NO][0mtransformer.  transformer ...................   [92m[OKAY][0m[93m[NO][0m[93m[NO][0m
............   ..............[93m[NO][0mstochastic_transformer    .......[92m[OKAY][0m[92m[OKAY][0m 
.
[92m[OKAY][0m 
[93m[NO][0m stochastic_transformer.......  stochastic_transformer[92m[OKAY][0m. 
 [93m[NO][0m.  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... async_io[93m[NO][0m  ......................  [93m[NO][0m[93m[NO][0m
 ....... [93m[NO][0m
transformer_inference transformer_inference..  ..[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

quantizer ..............quantizer  [93m[NO][0m..............  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninjaninjaninjaninja   .................. .................. .................. ..................[92m[OKAY][0m[92m[OKAY][0m 
 [92m[OKAY][0m
[92m[OKAY][0m
------------------------------------------------------------------------------------------------------------------------------------------------------


--------------------------------------------------op nameop name
op name  ................ ................op name  ................ installedinstalled   ................installed....    compatiblecompatible..installed
 
--------------------------------------------------compatible 
..--------------------------------------------------
 
--------------------------------------------------compatible

--------------------------------------------------
cpu_adam ............... [92m[YES][0mcpu_adam  cpu_adam.....................cpu_adam    [92m[OKAY][0m...............
 [92m[YES][0m...............[92m[YES][0m   ......[92m[YES][0m ...... [92m[OKAY][0m ......
[92m[OKAY][0mfused_adam [92m[OKAY][0m
 
............. [93m[NO][0m ....... [92m[OKAY][0m
fused_adam fused_lamb.............  fused_adam.............fused_adam [93m[NO][0m .............  .............[93m[NO][0m  [93m[NO][0m.......[93m[NO][0m    .......[92m[OKAY][0m....... .......
 [92m[OKAY][0m[92m[OKAY][0m 

fused_lamb[92m[OKAY][0m 
fused_lamb.............  [93m[NO][0m.............fused_lamb   .......[93m[NO][0m.............  sparse_attn[92m[OKAY][0m  .......
[93m[NO][0m............  [92m[OKAY][0m .......
[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
sparse_attn ............transformer  sparse_attn[93m[NO][0m............   .......[93m[NO][0m............   .......sparse_attn[92m[OKAY][0m [93m[NO][0m
[92m[OKAY][0m  
...................  transformer[92m[OKAY][0m[93m[NO][0mstochastic_transformer
   ....................   transformer[93m[NO][0m[92m[OKAY][0m[93m[NO][0m  
 .......................... transformer [92m[OKAY][0m [92m[OKAY][0m 

[93m[NO][0m............  .......[93m[NO][0mstochastic_transformer   [92m[OKAY][0m.......
 .[92m[OKAY][0m 
stochastic_transformer[93m[NO][0m  ....... .[92m[OKAY][0mstochastic_transformer 
 [93m[NO][0m ........  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utils utils..................  ..................[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
------------------------------------------------------------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op report--------------------------------------------------
DeepSpeed C++/CUDA extension op report


DeepSpeed C++/CUDA extension op report----------------------------------------------------------------------------------------------------


async_io ............... [93m[NO][0m ....... [93m[NO][0m
DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
JIT compiled ops requires ninjaJIT compiled ops requires ninja

--------------------------------------------------
--------------------------------------------------

JIT compiled ops requires ninja
JIT compiled ops requires ninja
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io async_io...............  [93m[NO][0m............... .......  [93m[NO][0m[93m[NO][0m 
....... [93m[NO][0m
transformer_inference transformer_inference..  ..[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
/bin/sh: line 0: type: git: not found
utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

quantizer quantizer..............  ..............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... async_io[93m[NO][0m  ......................  [93m[NO][0m[93m[NO][0m 
....... [93m[NO][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
transformer_inference .. [93m[NO][0m transformer_inference.......  ..[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
utils .................. [92m[YES][0m ...... utils[92m[OKAY][0m 
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
.................. [92m[YES][0m quantizer......  ..............[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0mquantizer
 .............. [93m[NO][0m .......-------------------------------------------------- 
[92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------
op name
op nameop nameop name   ................................ ................................    installedinstalledinstalled installed .. ..  ....   compatiblecompatiblecompatiblecompatible


------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------

cpu_adam cpu_adamcpu_adam............... cpu_adam  ............... [92m[YES][0m .............................. ......[92m[YES][0m    ......[92m[OKAY][0m[92m[YES][0m
[92m[YES][0m   [92m[OKAY][0m............
  [92m[OKAY][0m[92m[OKAY][0m

fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_adamfused_adamfused_adam  ..........................fused_lamb    ..........................[93m[NO][0m [93m[NO][0m   [93m[NO][0m[93m[NO][0m..............  .......   .......[92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m 
[92m[OKAY][0m
fused_lamb
 fused_lamb.............  fused_lamb.............[93m[NO][0m   ....................  [93m[NO][0m[93m[NO][0msparse_attn[92m[OKAY][0m 
  ..........................   [92m[OKAY][0m[93m[NO][0m[92m[OKAY][0m
 
....... [92m[OKAY][0m
transformersparse_attn  ........................  [93m[NO][0msparse_attn [93m[NO][0m ....... sparse_attn............ .......  ............[92m[OKAY][0m  
[93m[NO][0m[92m[OKAY][0m[93m[NO][0m .......
  stochastic_transformer[92m[OKAY][0m....... transformer
  [92m[OKAY][0m............ .transformer 
[93m[NO][0m[93m[NO][0m   ..........................transformer  [93m[NO][0m[92m[OKAY][0m 
  [92m[OKAY][0m...................
  stochastic_transformer[93m[NO][0m[92m[OKAY][0m 
 ....... .[92m[OKAY][0m stochastic_transformer
[93m[NO][0m  ........ stochastic_transformer [92m[OKAY][0m[93m[NO][0m
  ....... .[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

------------------------------------------------------------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


DeepSpeed C++/CUDA extension op report--------------------------------------------------
--------------------------------------------------

--------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report

--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io async_io...............  [93m[NO][0m...............  .......[93m[NO][0m  [93m[NO][0m.......
 [93m[NO][0m
transformer_inference .. [93m[NO][0mtransformer_inference  .........  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
utils .................. [92m[YES][0m utils......  ..................[92m[OKAY][0m 
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[92m[YES][0m ...... [92m[OKAY][0mquantizer
 .............. [93m[NO][0mquantizer  .....................  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
--------------------------------------------------
--------------------------------------------------
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:DeepSpeed general environment info:

torch install path ...............torch install path  ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version torch version....................  ....................1.8.1 
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
1.8.1
torch cuda version ...............torch cuda version  11.1...............
 nvcc version11.1 
async_io ............... [93m[NO][0m ....... [93m[NO][0m
.....................nvcc version  11.2.....................
 deepspeed install path11.2 
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
...........deepspeed install path  ...........['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
 ...................deepspeed info  0.4.2+bc17042, bc17042, big-science...................
 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w.
 deepspeed wheel compiled w.......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
--------------------------------------------------
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m utils.......  ..................[92m[OKAY][0m 
[92m[YES][0m ...... [92m[OKAY][0m
utils ..................quantizer  [92m[YES][0m..............  ......[93m[NO][0m  [92m[OKAY][0m
....... [92m[OKAY][0m
quantizer ..............-------------------------------------------------- 
[93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninjaninjaninjaninja    ...................................................... .................. [92m[OKAY][0m [92m[OKAY][0m 
[92m[OKAY][0m
[92m[OKAY][0m--------------------------------------------------

--------------------------------------------------

----------------------------------------------------------------------------------------------------op name
op name
 op nameop name ................   ................................installed................   .. installed installedinstalled compatible 
 ......--------------------------------------------------   
compatiblecompatiblecompatible


----------------------------------------------------------------------------------------------------
--------------------------------------------------

cpu_adam ...............cpu_adam cpu_adamcpu_adam[92m[YES][0m    ....................................  ............... [92m[YES][0m[92m[YES][0m  [92m[OKAY][0m [92m[YES][0m......
......   ......[92m[OKAY][0m[92m[OKAY][0m 

[92m[OKAY][0m
fused_adam .............fused_adam fused_adam[93m[NO][0m fused_adam .............  ....... .......................... [93m[NO][0m [92m[OKAY][0m  
[93m[NO][0m.......[93m[NO][0m   fused_lamb.......[92m[OKAY][0m ....... 
............. [92m[OKAY][0m [92m[OKAY][0m[93m[NO][0m
fused_lamb
  ....................fused_lamb   [92m[OKAY][0m[93m[NO][0mfused_lamb.............
   ....................[93m[NO][0m  [92m[OKAY][0m [93m[NO][0m
.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
sparse_attn ............ [93m[NO][0m .......sparse_attn  [92m[OKAY][0m............sparse_attnsparse_attn
   [93m[NO][0m............transformer ............  ................... [93m[NO][0m [92m[OKAY][0m  [93m[NO][0m[93m[NO][0m
 ....... ....... transformer.......  [92m[OKAY][0m............[92m[OKAY][0m 
[92m[OKAY][0m 

[93m[NO][0m transformer.......stochastic_transformer   transformer[92m[OKAY][0m............ 
. ............ [93m[NO][0m [93m[NO][0mstochastic_transformer[93m[NO][0m   ....... ...............    [92m[OKAY][0m[93m[NO][0m[92m[OKAY][0m[92m[OKAY][0m 


....... [92m[OKAY][0m
stochastic_transformerstochastic_transformer  ..  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... [93m[NO][0m ....... [93m[NO][0masync_io
 ...............async_io  [93m[NO][0m...............  .......[93m[NO][0m  [93m[NO][0m.......
 [93m[NO][0mtransformer_inference
 .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0mtransformer_inference utils .........   ..................[92m[OKAY][0m[93m[NO][0m 
 [92m[YES][0m.......  ......[92m[OKAY][0m 
[92m[OKAY][0mutils
 .................. [92m[YES][0mquantizer utils ...... .............. .................. [92m[OKAY][0m [93m[NO][0m
[92m[YES][0m  .............  [92m[OKAY][0m[92m[OKAY][0m

quantizer .............. quantizer--------------------------------------------------[93m[NO][0m 
 .....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m--------------------------------------------------

--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... [93m[NO][0m ....... [93m[NO][0masync_io
 ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0mtransformer_inference
 .. [93m[NO][0m ....... [92m[OKAY][0mutils
 .................. [92m[YES][0m ...... [92m[OKAY][0m
utils .................. [92m[YES][0mquantizer  ....................  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m --------------------------------------------------.......
 [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inference transformer_inference .. ..[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
utils ..................utils  [92m[YES][0m..................  ......[92m[YES][0m  [92m[OKAY][0m......
DeepSpeed general environment info:
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
 [92m[OKAY][0mquantizer
async_io ............... [93m[NO][0m ....... [93m[NO][0m
 .............. [93m[NO][0mquantizer  ....... [92m[OKAY][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
torch version .................... 1.8.1
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
.............. [93m[NO][0m .......-------------------------------------------------- 
[92m[OKAY][0m
torch cuda version ............... 11.1
async_ioutils  .................................  [93m[NO][0m[92m[YES][0m  .............  [93m[NO][0m[92m[OKAY][0m
--------------------------------------------------
nvcc version ..................... 11.2

deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... [93m[NO][0m async_io.......  [93m[NO][0m...............
 [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0mtransformer_inference  .........  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
utils .................. [92m[YES][0m utils......  ..................[92m[OKAY][0m 
[92m[YES][0m ...... [92m[OKAY][0mquantizer
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
 .............. [93m[NO][0m quantizer.......  ..............[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m--------------------------------------------------

--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
--------------------------------------------------
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utilsutils  .................. ..................[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m
 [92m[OKAY][0m
quantizer .............. [93m[NO][0mquantizer  .....................  [92m[OKAY][0m
[93m[NO][0m ....... --------------------------------------------------[92m[OKAY][0m

--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
DeepSpeed general environment info:torch install path ...............
 torch install path ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']...............
 torch version .................... 1.8.1['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch cuda version torch version...............  ....................11.1 
1.8.1nvcc version
 ..................... torch cuda version11.2 
...............deepspeed install path  11.1...........
 nvcc version ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'].....................
 deepspeed info11.2 
...................deepspeed install path  0.4.2+bc17042, bc17042, big-science...........
 deepspeed wheel compiled w.['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ......
 deepspeed infotorch 1.8, cuda 11.1 
................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
----------------------------------------------------------------------------------------------------

--------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report


DeepSpeed C++/CUDA extension op report--------------------------------------------------
--------------------------------------------------
--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


----------------------------------------------------------------------------------------------------
--------------------------------------------------
JIT compiled ops requires ninja
JIT compiled ops requires ninja
JIT compiled ops requires ninja

/bin/sh: line 0: type: git: not found
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
DeepSpeed general environment info:
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
torch version .................... 1.8.1
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
torch cuda version ............... 11.1
nvcc version ..................... 11.2
--------------------------------------------------
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

DeepSpeed general environment info:
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
torch version .................... 1.8.1
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
torch cuda version ............... 11.1
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninjaninjaninjaninja    ........................................................................   [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


----------------------------------------------------------------------------------------------------
--------------------------------------------------
--------------------------------------------------
op nameop name
op name   ................op name................................    ................installed  installed.. installedinstalled  ..compatible .. 
.. compatible--------------------------------------------------compatible
 

compatible--------------------------------------------------

--------------------------------------------------
--------------------------------------------------
cpu_adamcpu_adam  ...............cpu_adam ............... [92m[YES][0m ...............cpu_adam [92m[YES][0m ......   [92m[YES][0m[92m[OKAY][0m..................... 
  ......[92m[YES][0m[92m[OKAY][0m  
[92m[OKAY][0m......
 [92m[OKAY][0m
fused_adam ............. [93m[NO][0m fused_adam .......fused_adam ............. [92m[OKAY][0m .............
[93m[NO][0m  [93m[NO][0m.......fused_adamfused_lamb   ............. .......[92m[OKAY][0m.............  
 [92m[OKAY][0m[93m[NO][0m[93m[NO][0m
  fused_lamb..............fused_lamb   [92m[OKAY][0m
 .............[92m[OKAY][0m............. 
fused_lamb[93m[NO][0m   [93m[NO][0m.............  .......[93m[NO][0m.......  [92m[OKAY][0m .......
 [92m[OKAY][0m[92m[OKAY][0m

sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
sparse_attntransformer sparse_attn ............  ........................[93m[NO][0m   [93m[NO][0m[93m[NO][0msparse_attn .......  ....... ....... [92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m............

 transformer[93m[NO][0m transformer............  ............[93m[NO][0m stochastic_transformer.......    [93m[NO][0m.......[92m[OKAY][0m.  .......
 [92m[OKAY][0m [93m[NO][0m
[92m[OKAY][0m transformer
.......stochastic_transformer   stochastic_transformer[92m[OKAY][0m............. 
  [93m[NO][0m[93m[NO][0m . ....... ....... [93m[NO][0m [92m[OKAY][0m [92m[OKAY][0m
.......
 [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op report

--------------------------------------------------DeepSpeed C++/CUDA extension op report--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


--------------------------------------------------JIT compiled ops requires ninja--------------------------------------------------DeepSpeed C++/CUDA extension op report


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
JIT compiled ops requires ninja----------------------------------------------------------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja

--------------------------------------------------
JIT compiled ops requires ninja
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op report--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report----------------------------------------------------------------------------------------------------


----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
----------------------------------------------------------------------------------------------------


--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja


JIT compiled ops requires ninja--------------------------------------------------

JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... [93m[NO][0m .......async_io  [93m[NO][0m...............
 [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m .......utils  [92m[OKAY][0m..................
 [92m[YES][0m ...... [92m[OKAY][0m
utils ..................quantizer  [92m[YES][0m..............  ......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
quantizer ..............-------------------------------------------------- 
[93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... async_io[93m[NO][0m  ......................  [93m[NO][0m[93m[NO][0m 
....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... transformer_inference[92m[OKAY][0m 
.. [93m[NO][0m ....... [92m[OKAY][0mutils
 .................. [92m[YES][0m ......utils  [92m[OKAY][0m..................
 [92m[YES][0m ......quantizer  [92m[OKAY][0m..............
 [93m[NO][0m .......quantizer  [92m[OKAY][0m..............
 [93m[NO][0m .......-------------------------------------------------- 
[92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


DeepSpeed general environment info:
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


async_io ............... [93m[NO][0m ....... [93m[NO][0m
op nameop name op nameop name ................  ................ ................................ installed  installed installedinstalled ..  .. ....  compatible compatiblecompatible
compatible

--------------------------------------------------
----------------------------------------------------------------------------------------------------

--------------------------------------------------

transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
cpu_adamcpu_adam cpu_adamcpu_adam ............... ...............  ............... [92m[YES][0m............... [92m[YES][0m  [92m[YES][0m[92m[YES][0m ......  ...... ............ [92m[OKAY][0m  
[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


torch cuda version ............... 11.1
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
--------------------------------------------------
fused_adam .............fused_adamfused_adamfused_adam    [93m[NO][0m.......................................    .......[93m[NO][0m[93m[NO][0m[93m[NO][0m   [92m[OKAY][0m....... .......
 ....... [92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
fused_lamb
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
 fused_lambfused_lamb.............   .............[93m[NO][0m.............fused_lamb    .......[93m[NO][0m............. [93m[NO][0m   .......[92m[OKAY][0m[93m[NO][0m....... 
  [92m[OKAY][0m.......[92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
 
[92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn sparse_attn............sparse_attn  ............sparse_attn[93m[NO][0m    ...............................[93m[NO][0m  [93m[NO][0m  [92m[OKAY][0m [93m[NO][0m....... 
..............   [92m[OKAY][0m[92m[OKAY][0mtransformer[92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m

 
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
............transformer  transformertransformer[93m[NO][0m ............ ............ ............ .......  [93m[NO][0m[93m[NO][0m  [93m[NO][0m [92m[OKAY][0m....... 
--------------------------------------------------
....... .......  [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m
stochastic_transformer

 stochastic_transformer.  stochastic_transformerstochastic_transformer[93m[NO][0m .   ........[93m[NO][0m.    [93m[NO][0m[92m[OKAY][0m.......[93m[NO][0m 
  .......[92m[OKAY][0m....... 
 [92m[OKAY][0m[92m[OKAY][0m

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... [93m[NO][0m ....... async_io[93m[NO][0masync_io
  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference transformer_inference..  ..[93m[NO][0m  utils.......[93m[NO][0m   ..................[92m[OKAY][0m....... 
 [92m[YES][0m [92m[OKAY][0m......
 [92m[OKAY][0m
utils ..................utils quantizer  [92m[YES][0m................................   ......[92m[YES][0m[93m[NO][0m   [92m[OKAY][0m......
.......  [92m[OKAY][0m[92m[OKAY][0m
quantizer
 .............. [93m[NO][0mquantizer -------------------------------------------------- .......
..............  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inference ..transformer_inference  [93m[NO][0m..  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
utils ..................utils  [92m[YES][0m..................  ......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------
--------------------------------------------------

DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja--------------------------------------------------


----------------------------------------------------------------------------------------------------
DeepSpeed C++/CUDA extension op report
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------DeepSpeed C++/CUDA extension op report--------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------JIT compiled ops requires ninja


--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja--------------------------------------------------

JIT compiled ops requires ninja
ninjaninjaninjaninja   ....................................  .................. [92m[OKAY][0m.................. [92m[OKAY][0m
 [92m[OKAY][0m
[92m[OKAY][0m--------------------------------------------------
--------------------------------------------------

--------------------------------------------------
op name
-------------------------------------------------- op nameop name 
................ ................ op name................installed  installed   ................installed....   ..compatible installed 
compatible compatible--------------------------------------------------..

 
----------------------------------------------------------------------------------------------------compatible


--------------------------------------------------
cpu_adam ...............cpu_adam cpu_adam [92m[YES][0m cpu_adam .....................   ..............................[92m[OKAY][0m [92m[YES][0m 
[92m[YES][0m [92m[YES][0m ...... ...... ...... [92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m

fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_adamfused_adamfused_adam  ............. fused_lamb ..........................   [93m[NO][0m............. [93m[NO][0m[93m[NO][0m .......  [93m[NO][0m....... ....... [92m[OKAY][0m.......  
 [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


fused_lamb .............fused_lamb fused_lamb [93m[NO][0m ............. ............. ....... [93m[NO][0m sparse_attn[93m[NO][0m[92m[OKAY][0m   
..........................   [93m[NO][0m[92m[OKAY][0m[92m[OKAY][0m 
.......
 [92m[OKAY][0m
sparse_attntransformer  ........................  [93m[NO][0m[93m[NO][0m  .............. sparse_attn[92m[OKAY][0m sparse_attn 
[92m[OKAY][0m ............
............transformer  stochastic_transformer [93m[NO][0m............[93m[NO][0m  [93m[NO][0m   ............... .......   [92m[OKAY][0m[92m[OKAY][0m[93m[NO][0m[92m[OKAY][0m

 
.......transformer transformerstochastic_transformer[92m[OKAY][0m 
  ........................  .[93m[NO][0m[93m[NO][0m   .......[93m[NO][0m.......   .......[92m[OKAY][0m[92m[OKAY][0m 
[92m[OKAY][0m

stochastic_transformerstochastic_transformer  ..  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

DeepSpeed general environment info:
DeepSpeed general environment info:torch install path
 ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch version
 .................... 1.8.1torch version
 .................... torch cuda version1.8.1 
............... torch cuda version11.1 
...............nvcc version  11.1.....................
 nvcc version11.2 
.....................deepspeed install path  11.2...........
 deepspeed install path ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']...........
 deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
................... deepspeed info0.4.2+bc17042, bc17042, big-science 
................... deepspeed wheel compiled w.0.4.2+bc17042, bc17042, big-science 
......deepspeed wheel compiled w.  torch 1.8, cuda 11.1......
 torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. .................. [92m[YES][0m
 ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... async_io[92m[OKAY][0m
 ............... [93m[NO][0m-------------------------------------------------- 
....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninjaninjaninjaninja    ........................................................................   [92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------op name

op name op name ................op name................    ................................installedinstalled    installed..installed .. compatible  
..--------------------------------------------------..compatible 
 
compatiblecompatible
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
cpu_adam ............... [92m[YES][0m cpu_adamcpu_adam......cpu_adam    ...............[92m[OKAY][0m.............................. 
  [92m[YES][0m[92m[YES][0m[92m[YES][0m   ..................   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


async_io ............... [93m[NO][0m ....... [93m[NO][0m
fused_adam ............. [93m[NO][0m ....... fused_adamfused_adam[92m[OKAY][0m  fused_adam
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.utils
..........................   .............[93m[NO][0m [93m[NO][0m fused_lamb [93m[NO][0m....... .......  .................... [92m[OKAY][0m  [92m[OKAY][0m
[93m[NO][0m[92m[OKAY][0m
 
 .................. [92m[YES][0m ...... [92m[OKAY][0m
.......fused_lamb  [92m[OKAY][0mfused_lamb.............
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
fused_lamb   .............[93m[NO][0m.............   [93m[NO][0m[93m[NO][0m.......   ..............[92m[OKAY][0m  [92m[OKAY][0m
[92m[OKAY][0m

quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
async_io ............... [93m[NO][0m .......async_io [93m[NO][0m 
............... [93m[NO][0m ....... [93m[NO][0m
transformer sparse_attnsparse_attn............ sparse_attn  ........................[93m[NO][0m  [93m[NO][0m   ............[93m[NO][0m....... ....... [93m[NO][0m  [92m[OKAY][0m [92m[OKAY][0m.......
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0mtransformer_inference
 .. [93m[NO][0m .......utils  [92m[OKAY][0m..................
.......
  [92m[OKAY][0m[92m[OKAY][0m

transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
 [92m[YES][0m ...... [92m[OKAY][0m
stochastic_transformertransformertransformer  transformer ............ ............. ............ [93m[NO][0m  [93m[NO][0m  [93m[NO][0m[93m[NO][0m....... .......  ....... .......[92m[OKAY][0m [92m[OKAY][0m 
[92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
utils .................. [92m[YES][0mquantizer  ....................  [92m[OKAY][0m[93m[NO][0m
[92m[OKAY][0m

quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
 ....... quantizer[92m[OKAY][0m 
.............. [93m[NO][0m .......-------------------------------------------------- 
[92m[OKAY][0m
stochastic_transformer stochastic_transformer.stochastic_transformer   [93m[NO][0m.  ........[93m[NO][0m   [93m[NO][0m[92m[OKAY][0m....... 
 .......[92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info deepspeed install path...................  ...........0.4.2+bc17042, bc17042, big-science 
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1

deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utilsutils  .................................... [92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
quantizer .............. quantizer[93m[NO][0m  .....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
------------------------------------------------------------------------------------------------------------------------------------------------------


DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report


------------------------------------------------------------------------------------------------------------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

--------------------------------------------------
--------------------------------------------------
--------------------------------------------------JIT compiled ops requires ninja
DeepSpeed C++/CUDA extension op report

JIT compiled ops requires ninja
JIT compiled ops requires ninja

--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
/bin/sh: line 0: type: git: not found
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
--------------------------------------------------
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
--------------------------------------------------
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
ninjaninjaninjaninja    .................................... ....................................  [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[92m[OKAY][0m


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


op nameop nameop nameop name    ................................................................   installed installedinstalled  installed .... ..  .. compatible 
compatiblecompatiblecompatible--------------------------------------------------


------------------------------------------------------------------------------------------------------------------------------------------------------
DeepSpeed general environment info:


cpu_adam ............... cpu_adamcpu_adam[92m[YES][0m cpu_adam  ............... ....................................    [92m[YES][0m[92m[YES][0m[92m[YES][0m[92m[OKAY][0m  
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
...... ...... ...... [92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m

torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
fused_adam ............. [93m[NO][0mfused_adam fused_adamfused_adam .......  ............. ..........................  [93m[NO][0m[92m[OKAY][0m[93m[NO][0m
   .......[93m[NO][0m.......   fused_lamb.......[92m[OKAY][0m [92m[OKAY][0m 
.............
DeepSpeed general environment info:
[92m[OKAY][0m fused_lamb
[93m[NO][0mfused_lamb   ....................fused_lamb.............   [93m[NO][0m [93m[NO][0m .............[92m[OKAY][0m ....... 
....... [93m[NO][0m [92m[OKAY][0m [92m[OKAY][0m
.......
 [92m[OKAY][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
sparse_attn ............sparse_attnsparse_attn sparse_attn  [93m[NO][0m........................    [93m[NO][0m................... [93m[NO][0m  [93m[NO][0m .......[92m[OKAY][0m ....... .......
[92m[OKAY][0m  
torch cuda version ............... 11.1
[92m[OKAY][0m[92m[OKAY][0mtransformer
 
nvcc version ..................... 11.2
transformer............ transformer ............ transformer[93m[NO][0m   [93m[NO][0m...............................    .......[93m[NO][0m[92m[OKAY][0m[93m[NO][0m  
 [92m[OKAY][0m..............
  [92m[OKAY][0m[92m[OKAY][0m

deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
DeepSpeed general environment info:
stochastic_transformerstochastic_transformer  stochastic_transformerstochastic_transformer.  . [93m[NO][0m .. [93m[NO][0m [93m[NO][0m .......  [93m[NO][0m .............. [92m[OKAY][0m  .......
[92m[OKAY][0m [92m[OKAY][0m
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[92m[OKAY][0m

torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch version .................... 1.8.1
torch cuda version ............... 11.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
torch version .................... 1.8.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inference transformer_inference..  ..[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
utils .................. [92m[YES][0m ...... utils[92m[OKAY][0m 
.................. [92m[YES][0m ......quantizer  [92m[OKAY][0m..............
 [93m[NO][0m .......quantizer  [92m[OKAY][0m..............
 [93m[NO][0m .......-------------------------------------------------- 
[92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path ...............  ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

DeepSpeed general environment info:DeepSpeed general environment info:

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science
torch cuda versiontorch cuda version  ..............................  11.111.1


deepspeed wheel compiled w. deepspeed wheel compiled w.......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w. deepspeed wheel compiled w.......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... [93m[NO][0m ....... [93m[NO][0masync_io
 ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. transformer_inference[93m[NO][0m  .........  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
utils .................. utils[92m[YES][0m  ........................  [92m[YES][0m[92m[OKAY][0m 
...... [92m[OKAY][0m
quantizer .............. [93m[NO][0mquantizer  .....................  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op report

--------------------------------------------------
DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

----------------------------------------------------------------------------------------------------

--------------------------------------------------
JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.----------------------------------------------------------------------------------------------------


--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


JIT compiled ops requires ninja--------------------------------------------------
--------------------------------------------------

JIT compiled ops requires ninja
JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path ...............  ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ............... ...............11.1 
11.1
------------------------------------------------------------------------------------------------------------------------------------------------------


DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report


------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
nvcc version nvcc version.....................  .....................11.2 
11.2deepspeed install path
 deepspeed install path...........  ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
      meet the required dependencies to JIT install the op.
--------------------------------------------------DeepSpeed C++/CUDA extension op report
--------------------------------------------------

--------------------------------------------------
JIT compiled ops requires ninja--------------------------------------------------JIT compiled ops requires ninja


JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------
JIT compiled ops requires ninja
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info
 deepspeed info...................  ................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
JIT compiled ops requires ninja--------------------------------------------------

JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inference .. transformer_inference[93m[NO][0m  .........  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
utils .................. [92m[YES][0m utils......  ..................[92m[OKAY][0m 
[92m[YES][0m ...... quantizer[92m[OKAY][0m 
.............. [93m[NO][0mquantizer  .....................  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
----------------------------------------------------------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------DeepSpeed C++/CUDA extension op report----------------------------------------------------------------------------------------------------


JIT compiled ops requires ninja--------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.----------------------------------------------------------------------------------------------------


--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja


----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninja
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninjaninjaninjaninja   .................. ....................................  .................. [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m 


[92m[OKAY][0m----------------------------------------------------------------------------------------------------

--------------------------------------------------
op name
op name --------------------------------------------------................op name 
DeepSpeed general environment info:
  ................op name................installed   ................ ..installed installed  installed ..compatible .. 
 compatible--------------------------------------------------..compatible


 ----------------------------------------------------------------------------------------------------compatible


torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
--------------------------------------------------
torch version .................... 1.8.1DeepSpeed general environment info:
torch cuda version 
cpu_adam ............... cpu_adam[92m[YES][0m  cpu_adam..................... ...............  cpu_adam[92m[OKAY][0m [92m[YES][0m 
............... 11.1
[92m[YES][0m ............... ...... ...... [92m[YES][0m [92m[OKAY][0m [92m[OKAY][0mfused_adam......

  .............[92m[OKAY][0m [93m[NO][0m
 ....... [92m[OKAY][0mfused_adam
nvcc version torch install path.....................  ...............11.2 
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
fused_adam  ..........................fused_lamb   [93m[NO][0mfused_adam[93m[NO][0m.............   ....... ....................[93m[NO][0m    [92m[OKAY][0m[92m[OKAY][0m.......[93m[NO][0m

  [92m[OKAY][0m.......

deepspeed info ...................torch version  0.4.2+bc17042, bc17042, big-science....................
fused_lamb fused_lamb [92m[OKAY][0m ..........................
 deepspeed wheel compiled w.1.8.1 
...... torch 1.8, cuda 11.1torch cuda version
  [93m[NO][0m[93m[NO][0m  fused_lamb.......sparse_attn.......   ............[92m[OKAY][0m  .............
[93m[NO][0m [92m[OKAY][0m [93m[NO][0m
 ............... 11.1
nvcc version ..................... 11.2
.......  .......[92m[OKAY][0m
 [92m[OKAY][0m
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
transformersparse_attn  ........................  [93m[NO][0m[93m[NO][0m sparse_attn ....... .......  sparse_attn[92m[OKAY][0m............
[92m[OKAY][0m  
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
............[93m[NO][0m  [93m[NO][0mtransformerstochastic_transformer.......  ....... ............  [92m[OKAY][0m [92m[OKAY][0m
.[93m[NO][0m
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
  [93m[NO][0m.......  transformer.......transformer[92m[OKAY][0m  [92m[OKAY][0m
............ 
 ............[93m[NO][0m  [93m[NO][0m....... stochastic_transformer ....... [92m[OKAY][0m 
[92m[OKAY][0m.
 [93m[NO][0m .......stochastic_transformer stochastic_transformer [92m[OKAY][0m 
. .[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninjaninjaninjaninja  ..................  .................. .................. [92m[OKAY][0m.................. [92m[OKAY][0m
[92m[OKAY][0m 

[92m[OKAY][0m----------------------------------------------------------------------------------------------------


--------------------------------------------------op nameop name
  ................................op name   installedinstalled................   ....--------------------------------------------------installed 
  compatiblecompatible..op name

  compatible----------------------------------------------------------------------------------------------------
................

 --------------------------------------------------installed
cpu_adam  ...............cpu_adam..   [92m[YES][0m...............cpu_adamcompatible   ......[92m[YES][0m...............   [92m[OKAY][0m......
[92m[YES][0m
  --------------------------------------------------......[92m[OKAY][0m
 
[92m[OKAY][0m
fused_adam ............. [93m[NO][0m .......fused_adam cpu_adam [92m[OKAY][0mfused_adam
.............  .............[93m[NO][0m  fused_lamb [93m[NO][0m...............  ....... ....................[92m[YES][0m  [92m[OKAY][0m [92m[OKAY][0m
[93m[NO][0m
  .............fused_lambfused_lamb    [92m[OKAY][0m[92m[OKAY][0m.............
.............
DeepSpeed general environment info:
  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

sparse_attn fused_adam............ [93m[NO][0msparse_attn sparse_attn   ............................................    [93m[NO][0m[92m[OKAY][0m[93m[NO][0m[93m[NO][0m 
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
  .......transformer..............   ............[92m[OKAY][0m[92m[OKAY][0m  
[93m[NO][0m
[92m[OKAY][0m 
transformertransformer.......   ........................[92m[OKAY][0mfused_lamb 
torch version .................... 1.8.1
ninjaninjaninjaninja    ........................................................................   [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m


----------------------------------------------------------------------------------------------------

----------------------------------------------------------------------------------------------------op nameop name
 
 [93m[NO][0m[93m[NO][0m  ..............   stochastic_transformer.............[92m[OKAY][0m[92m[OKAY][0m  

torch cuda version ............... 11.1
 ................op name ................op name installed ................   installedinstalled..................    ....installedcompatible  
[93m[NO][0m. stochastic_transformer[93m[NO][0mstochastic_transformer    ...............   .[92m[OKAY][0m[93m[NO][0m 
 [92m[OKAY][0m[93m[NO][0m.......
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
 compatiblecompatible
--------------------------------------------------
..--------------------------------------------------

-------------------------------------------------- 
compatible
--------------------------------------------------
  .......[92m[OKAY][0m 
[92m[OKAY][0m
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
cpu_adam ...............cpu_adam cpu_adam [92m[YES][0m  ....................................   [92m[YES][0mcpu_adam[92m[YES][0m[92m[OKAY][0m   
...........................   [92m[OKAY][0m[92m[YES][0m[92m[OKAY][0m
 
...... [92m[OKAY][0mfused_adam
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
 ............. [93m[NO][0mfused_adam  fused_adam....................   [92m[OKAY][0m.............[93m[NO][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
  fused_adam[93m[NO][0mfused_lamb.......   ....... ..........................[92m[OKAY][0m   [92m[OKAY][0m
[93m[NO][0m
[93m[NO][0m .......fused_lamb  fused_lamb [92m[OKAY][0m ....................
.............   [93m[NO][0m[93m[NO][0m [92m[OKAY][0m .......
.......  [92m[OKAY][0m[92m[OKAY][0m
sparse_attn
fused_lamb ............  [93m[NO][0m.............  ....... [93m[NO][0m[92m[OKAY][0m
 sparse_attn.......transformer sparse_attn ............ ............   [92m[OKAY][0m[93m[NO][0m............[93m[NO][0m 
.......   [93m[NO][0m.......[92m[OKAY][0m  
[92m[OKAY][0m.......transformer
  ............[92m[OKAY][0m stochastic_transformer[93m[NO][0m 
 ........sparse_attn  transformer[92m[OKAY][0m[93m[NO][0m 
  ...............................   stochastic_transformer[93m[NO][0m[92m[OKAY][0m[93m[NO][0m
   ....... ........[92m[OKAY][0m  
[92m[OKAY][0m[93m[NO][0m
 .......stochastic_transformer  [92m[OKAY][0mtransformer
.  [93m[NO][0m............  ....... [93m[NO][0m[92m[OKAY][0m
 ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install pathDeepSpeed general environment info: ............... 
DeepSpeed general environment info:
torch install path ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']...............
 torch version .................... 1.8.1['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
torch cuda version torch version...............  ....................11.1 
1.8.1
nvcc version .....................torch cuda version  11.2...............
 deepspeed install path11.1 
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
...........nvcc version  ..................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']11.2

deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
transformer_inferenceasync_io ..  ...............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[93m[NO][0m
deepspeed infodeepspeed install path  ..............................  0.4.2+bc17042, bc17042, big-science
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed wheel compiled w.
 deepspeed info......  ...................torch 1.8, cuda 11.1 
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
transformer_inference quantizer..  [93m[NO][0m..............  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
utils-------------------------------------------------- 
.................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:DeepSpeed general environment info:

utils .................. [92m[YES][0m ...... [92m[OKAY][0m
torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda version torch cuda version...............  ...............11.1 
11.1
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
nvcc version nvcc version.....................  .....................11.2 
11.2deepspeed install path
 deepspeed install path...........  ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info
--------------------------------------------------
 deepspeed info...................  ...................0.4.2+bc17042, bc17042, big-science 
0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w.
 deepspeed wheel compiled w.......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
ninjaninjaninjaninja    ........................................................................   [92m[OKAY][0m [92m[OKAY][0m
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
[92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
op name--------------------------------------------------
op name
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
op name  op name ................ ................ ................installed................    installed..installedinstalled    ....compatible..  compatible
 
compatible--------------------------------------------------compatible
--------------------------------------------------

--------------------------------------------------

--------------------------------------------------
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
cpu_adam ............... [92m[YES][0mcpu_adamcpu_adamcpu_adam    ..................... .............................. [92m[OKAY][0m  
[92m[YES][0m[92m[YES][0m[92m[YES][0m   ..................   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_adamfused_lamb fused_adamfused_adam  .......................... .............   .............[93m[NO][0m[93m[NO][0m[93m[NO][0m  .......  [93m[NO][0m ....... ....... [92m[OKAY][0m.......[92m[OKAY][0m 
 
[92m[OKAY][0m[92m[OKAY][0m

fused_lamb ............. fused_lambfused_lamb[93m[NO][0m  ............. ............. .......sparse_attn[93m[NO][0m    [93m[NO][0m[92m[OKAY][0m................... 
  .......[92m[OKAY][0m[93m[NO][0m
  .......[92m[OKAY][0m 
[92m[OKAY][0m
transformersparse_attn ............  ............[93m[NO][0msparse_attn   sparse_attn...................[93m[NO][0m    [93m[NO][0m............[92m[OKAY][0m .......
.......   [92m[OKAY][0m[93m[NO][0mstochastic_transformer
[92m[OKAY][0m  
.......transformer . ............ [92m[OKAY][0mtransformer [93m[NO][0m
  [93m[NO][0m............ ....... transformer....... [93m[NO][0m [92m[OKAY][0m  ...................
 [92m[OKAY][0m [93m[NO][0m
[92m[OKAY][0m 
....... stochastic_transformer[92m[OKAY][0m 
stochastic_transformer . .[93m[NO][0mstochastic_transformer   [93m[NO][0m....... . ....... [92m[OKAY][0m [93m[NO][0m[92m[OKAY][0m
 
....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
----------------------------------------------------------------------------------------------------

DeepSpeed general environment info:
JIT compiled ops requires ninja
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report--------------------------------------------------


--------------------------------------------------JIT compiled ops requires ninja
--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninjaJIT compiled ops requires ninja

torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
DeepSpeed general environment info:
torch version .................... 1.8.1
torch cuda version ............... 11.1
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
torch version .................... 1.8.1
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
DeepSpeed general environment info:
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda version ............... 11.1
nvcc version torch cuda version.....................  ...............11.2 
11.1
deepspeed install pathnvcc version  ................................  11.2['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed install pathdeepspeed info  ..............................  0.4.2+bc17042, bc17042, big-science
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed wheel compiled w.
 deepspeed info......  ...................torch 1.8, cuda 11.1
 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... async_io[93m[NO][0m  ......................  [93m[NO][0m[93m[NO][0m 
....... [93m[NO][0m
transformer_inference .. [93m[NO][0m transformer_inference.......  ..[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
utils .................. [92m[YES][0m ......utils  [92m[OKAY][0m..................
 [92m[YES][0m quantizer......  ..............[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
quantizer .............. [93m[NO][0m-------------------------------------------------- 
....... [92m[OKAY][0m
--------------------------------------------------
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
------------------------------------------------------------------------------------------------------------------------------------------------------


DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report


--------------------------------------------------
----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
      meet the required dependencies to JIT install the op.


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.----------------------------------------------------------------------------------------------------


----------------------------------------------------------------------------------------------------JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report


JIT compiled ops requires ninja--------------------------------------------------JIT compiled ops requires ninja


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
torch cuda version ............... 11.1
nvcc version ..................... 11.2
--------------------------------------------------
JIT compiled ops requires ninja
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
DeepSpeed general environment info:
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
ninjaninjaninjaninja    ...................................................... ..................  [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m


nvcc version ..................... 11.2
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
op nameop nameop nameop name    ................................................................    installedinstalledinstalled installed .. ..   ..compatible.. compatible
 compatible
--------------------------------------------------compatible
--------------------------------------------------


----------------------------------------------------------------------------------------------------

deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

cpu_adam ...............cpu_adam  cpu_adam[92m[YES][0m...............cpu_adam   [92m[YES][0m .....................  ...............  ......[92m[OKAY][0m[92m[YES][0m [92m[YES][0m
  [92m[OKAY][0m......
......  [92m[OKAY][0m[92m[OKAY][0m

async_io async_io...............  [93m[NO][0m...............  .......[93m[NO][0m  [93m[NO][0m.......
 [93m[NO][0m
fused_adam ............. [93m[NO][0mfused_adam  fused_adam....... fused_adam..........................    [92m[OKAY][0m[93m[NO][0m.............[93m[NO][0m
   .......[93m[NO][0m .......fused_lamb[92m[OKAY][0m 
transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

  ....................[92m[OKAY][0mfused_lamb   [93m[NO][0m
[92m[OKAY][0m.............
utils ..................utils  [92m[YES][0m..................  ......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
  fused_lamb[93m[NO][0m.......fused_lamb   .................... [92m[OKAY][0m  .............
 [92m[OKAY][0m[93m[NO][0m[93m[NO][0m
quantizer quantizer..............  ..............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
  ..............  [92m[OKAY][0m[92m[OKAY][0m

[92m[OKAY][0m
----------------------------------------------------------------------------------------------------

sparse_attn ............ sparse_attn[93m[NO][0m  ................... sparse_attn[93m[NO][0m   [92m[OKAY][0msparse_attn....... ............
  ............transformer[92m[OKAY][0m[93m[NO][0m 
 [93m[NO][0m ....... ............ .......  transformer[92m[OKAY][0m[92m[OKAY][0m[93m[NO][0m
 
 ...................transformer transformer  [92m[OKAY][0m[93m[NO][0m ............
 ............ ....... [93m[NO][0mstochastic_transformer [93m[NO][0m  [92m[OKAY][0m .......
....... . [92m[OKAY][0m [92m[OKAY][0m[93m[NO][0m

stochastic_transformer  .......stochastic_transformer stochastic_transformer .[92m[OKAY][0m  
.[93m[NO][0m .  [93m[NO][0m.......[93m[NO][0m  ....... [92m[OKAY][0m .......
[92m[OKAY][0m 
[92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io async_io...............  ...............[93m[NO][0m  [93m[NO][0m.......  .......[93m[NO][0m 
[93m[NO][0m
transformer_inference ..transformer_inference  [93m[NO][0m..  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

quantizer quantizer..............  ..............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ............... ...............[93m[NO][0m  [93m[NO][0m.......  .......[93m[NO][0m 
[93m[NO][0m
transformer_inference transformer_inference..  ..[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m
 [92m[OKAY][0m
utils ..................utils  [92m[YES][0m..................  ......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
--------------------------------------------------
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.1
1.8.1
torch cuda version torch cuda version...............  ...............11.1 
11.1nvcc version
 nvcc version.....................  .....................11.2 
11.2deepspeed install path
 deepspeed install path...........  ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info
 deepspeed info...................  ...................0.4.2+bc17042, bc17042, big-science 
0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w.
 deepspeed wheel compiled w.......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


--------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------

op name
 op nameop name................op name    ................................installed ................  installedinstalled  ....   installedcompatible..compatible 

 ..--------------------------------------------------compatible--------------------------------------------------
 

compatible--------------------------------------------------

--------------------------------------------------
cpu_adamcpu_adam  ..............................cpu_adam   [92m[YES][0m[92m[YES][0mcpu_adam...............    ..................... ......[92m[YES][0m[92m[OKAY][0m  [92m[YES][0m 
[92m[OKAY][0m ......
......  [92m[OKAY][0m[92m[OKAY][0m

fused_adam ............. fused_adam[93m[NO][0m  .............fused_adam fused_adam[93m[NO][0m.......    [92m[OKAY][0m.................................
   [92m[OKAY][0m[93m[NO][0m
[93m[NO][0mfused_lamb   .......fused_lamb....................    [92m[OKAY][0m.............[93m[NO][0m
 [92m[OKAY][0m .......[93m[NO][0m fused_lamb[92m[OKAY][0m
 
 ....... .............[92m[OKAY][0mfused_lamb 
 [93m[NO][0m.............  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0msparse_attn
 ............ sparse_attn[93m[NO][0m  ...................  [93m[NO][0m[92m[OKAY][0msparse_attn 
.......  ............transformer[92m[OKAY][0msparse_attn
   [93m[NO][0m........................  transformer .......[93m[NO][0m  [93m[NO][0m ...................  [92m[OKAY][0m [92m[OKAY][0m.......
[93m[NO][0m 
 [92m[OKAY][0m.......transformer
 stochastic_transformer [92m[OKAY][0m 
............. transformer [93m[NO][0mstochastic_transformer [93m[NO][0m  ............ ........ .......   [93m[NO][0m[92m[OKAY][0m[93m[NO][0m[92m[OKAY][0m  .......

 .......[92m[OKAY][0m 
stochastic_transformer[92m[OKAY][0m 
. [93m[NO][0mstochastic_transformer  ....... [92m[OKAY][0m.
 [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
----------------------------------------------------------------------------------------------------
DeepSpeed C++/CUDA extension op report

DeepSpeed C++/CUDA extension op report--------------------------------------------------

----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------


DeepSpeed C++/CUDA extension op report--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
DeepSpeed C++/CUDA extension op report

--------------------------------------------------
JIT compiled ops requires ninja
----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
JIT compiled ops requires ninja
--------------------------------------------------

--------------------------------------------------
JIT compiled ops requires ninjaJIT compiled ops requires ninja

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
DeepSpeed general environment info:

----------------------------------------------------------------------------------------------------
DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja


--------------------------------------------------DeepSpeed C++/CUDA extension op report--------------------------------------------------


--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------
--------------------------------------------------

torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
--------------------------------------------------JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


JIT compiled ops requires ninja--------------------------------------------------

JIT compiled ops requires ninja
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------

op name
op name op name ................ op name................ ................ installed  ................ installedinstalled ..  installed....    compatiblecompatible..compatible

 
compatible--------------------------------------------------

------------------------------------------------------------------------------------------------------------------------------------------------------


cpu_adam cpu_adamcpu_adam...............cpu_adam    ...............[92m[YES][0m..............................    ......[92m[YES][0m[92m[YES][0m[92m[YES][0m  [92m[OKAY][0m......   
............[92m[OKAY][0m  
[92m[OKAY][0m[92m[OKAY][0m

fused_adam fused_adam.............fused_adamfused_adam    [93m[NO][0m.......................................    .......[93m[NO][0m[93m[NO][0m[93m[NO][0m    [92m[OKAY][0m..............
 ....... [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m
fused_lamb

 fused_lamb.............  fused_lamb.............fused_lamb[93m[NO][0m    [93m[NO][0m.......................... .......  .......  [93m[NO][0m[93m[NO][0m[92m[OKAY][0m[92m[OKAY][0m  

..............  [92m[OKAY][0m[92m[OKAY][0m

sparse_attnsparse_attn  ............sparse_attn............  sparse_attn [93m[NO][0m[93m[NO][0m ............  ............  ..............[93m[NO][0m[93m[NO][0m    [92m[OKAY][0m.......[92m[OKAY][0m....... 

[92m[OKAY][0m [92m[OKAY][0m
transformer
transformer transformer ............ ............ transformer............ [93m[NO][0m [93m[NO][0m  [93m[NO][0m ............ .............. ....... [93m[NO][0m  [92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m.......
 
[92m[OKAY][0mstochastic_transformer
 stochastic_transformerstochastic_transformer  . stochastic_transformer..[93m[NO][0m   [93m[NO][0m .[93m[NO][0m .......   ....... .......[93m[NO][0m[92m[OKAY][0m[92m[OKAY][0m  

[92m[OKAY][0m.......
 [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
ninjaninjaninjaninja  ..................  .................. ....................................  [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m

[92m[OKAY][0m
--------------------------------------------------

------------------------------------------------------------------------------------------------------------------------------------------------------op name

 
op name................op name op name  ................ installed................ ................  installed..installed    installed....compatible   
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version ....................DeepSpeed general environment info: 1.8.1

..compatiblecompatible
--------------------------------------------------
 --------------------------------------------------
--------------------------------------------------
compatible

--------------------------------------------------
cpu_adamcpu_adam cpu_adam...............   ..............................[92m[YES][0mcpu_adam    [92m[YES][0m...............[92m[YES][0m......    ......[92m[YES][0m[92m[OKAY][0m......   
torch cuda version ............... 11.1torch install path
 nvcc version...............  ..................... 11.2
deepspeed install path ........... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed infotorch version  .......................................  0.4.2+bc17042, bc17042, big-science1.8.1

[92m[OKAY][0m[92m[OKAY][0m......

 [92m[OKAY][0m
deepspeed wheel compiled w. torch cuda version......  ...............torch 1.8, cuda 11.1 
11.1
nvcc version ..................... 11.2
DeepSpeed general environment info:DeepSpeed general environment info:

fused_adam ............. fused_adamfused_adam[93m[NO][0m   .............fused_adam....................   [93m[NO][0m [93m[NO][0m.............  [92m[OKAY][0m .......
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
....... [93m[NO][0m [92m[OKAY][0m .......[92m[OKAY][0mfused_lamb
 
 [92m[OKAY][0m.............
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1
fused_lamb  fused_lamb[93m[NO][0m.............   fused_lamb....................[93m[NO][0m    .......[92m[OKAY][0m [93m[NO][0m
............. [92m[OKAY][0m .......
[93m[NO][0m  [92m[OKAY][0m.......

torch cuda versiontorch cuda version  ..............................  11.111.1

 [92m[OKAY][0m
nvcc versionnvcc version  ..........................................  11.211.2

sparse_attn ............sparse_attn  [93m[NO][0m............  .......[93m[NO][0m sparse_attn[92m[OKAY][0m sparse_attn .......
 ............ ............ transformer[92m[OKAY][0m [93m[NO][0m 
 ............[93m[NO][0m....... transformer [93m[NO][0m  ...................[92m[OKAY][0m   
deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

[92m[OKAY][0m.......[93m[NO][0m
 transformer [92m[OKAY][0m transformer.......
deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

............ ............ stochastic_transformer   [92m[OKAY][0m[93m[NO][0m[93m[NO][0m 
 ...............  stochastic_transformer[93m[NO][0m [92m[OKAY][0m [92m[OKAY][0m 

........  [93m[NO][0m[92m[OKAY][0m 
stochastic_transformerstochastic_transformer.......   [92m[OKAY][0m.
. [93m[NO][0m  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version DeepSpeed general environment info:............... 11.1

nvcc version ..................... torch install path11.2
 deepspeed install path...............  ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']...................
 0.4.2+bc17042, bc17042, big-science
torch versiondeepspeed wheel compiled w.  ..........................  1.8.1torch 1.8, cuda 11.1

torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inference transformer_inference..  [93m[NO][0m .........  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ......utils  [92m[OKAY][0m..................
 [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0mquantizer  .....................  [92m[OKAY][0m[93m[NO][0m
 --------------------------------------------------.......
 [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
--------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ...............async_io  [93m[NO][0m ......................  [93m[NO][0m[93m[NO][0m 
....... [93m[NO][0m
transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ...............DeepSpeed general environment info: 11.1
nvcc version
 ..................... 11.2
deepspeed install path torch install path...........  ...............['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w.['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 
...... torch 1.8, cuda 11.1
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
DeepSpeed general environment info:
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
torch cuda version ............... 11.1
nvcc version ..................... 11.2
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
--------------------------------------------------
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m .......async_io [93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m[NO][0m 
............... 
[93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0mtransformer_inference
 ..async_io [93m[NO][0m  utils......................   ..................[93m[NO][0m[92m[OKAY][0m  
[92m[YES][0m.......  ......[93m[NO][0m 
[92m[OKAY][0mutils
 .................. quantizer[92m[YES][0m  ....................  [93m[NO][0m[92m[OKAY][0m 
.......transformer_inference  [92m[OKAY][0mquantizer..
  ..............[93m[NO][0m  --------------------------------------------------[93m[NO][0m.......
  .......[92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------utils
 .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------DeepSpeed C++/CUDA extension op report

JIT compiled ops requires ninja--------------------------------------------------
----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
DeepSpeed C++/CUDA extension op report

--------------------------------------------------DeepSpeed C++/CUDA extension op report
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

--------------------------------------------------JIT compiled ops requires ninja

JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... [93m[NO][0m ....... [93m[NO][0masync_io
 ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m transformer_inference.......  ..[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0mutils  ........................  [92m[OKAY][0m[92m[YES][0m
 ...... [92m[OKAY][0m
quantizer .............. quantizer[93m[NO][0m  .....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
DeepSpeed general environment info:
async_io ............... [93m[NO][0m ....... [93m[NO][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
torch version .................... 1.8.1
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
torch cuda version ............... 11.1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report


quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
----------------------------------------------------------------------------------------------------

----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


--------------------------------------------------
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
----------------------------------------------------------------------------------------------------JIT compiled ops requires ninja--------------------------------------------------


JIT compiled ops requires ninjaJIT compiled ops requires ninjaJIT compiled ops requires ninja


deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
DeepSpeed general environment info:
torch cuda version ............... 11.1
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
torch version .................... 1.8.1
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
ninjaninjaninja  ninja .................................... .................. .................. [92m[OKAY][0m [92m[OKAY][0m
 [92m[OKAY][0m[92m[OKAY][0m
--------------------------------------------------

--------------------------------------------------
op name----------------------------------------------------------------------------------------------------
 

................op name op nameop name installed   ..................................................   installedcompatible installed 
installed..-------------------------------------------------- 
  ....compatible  compatible
compatible

----------------------------------------------------------------------------------------------------cpu_adam
--------------------------------------------------
 
............... [92m[YES][0m ...... [92m[OKAY][0m
cpu_adamcpu_adam cpu_adam ..............................   ...............[92m[YES][0m[92m[YES][0mfused_adam   [92m[YES][0m  ......................... ......  [92m[OKAY][0m [93m[NO][0m
[92m[OKAY][0m[92m[OKAY][0m
 
....... [92m[OKAY][0m
fused_lambfused_adam  ..........................  [93m[NO][0mfused_adam[93m[NO][0m fused_adam.......    .................................[92m[OKAY][0m  [92m[OKAY][0m 
[93m[NO][0m
[93m[NO][0m  .......fused_lamb.......  [92m[OKAY][0m .............
[92m[OKAY][0m 
[93m[NO][0msparse_attnfused_lamb  ...................fused_lamb   [92m[OKAY][0m ..........................
[93m[NO][0m   [93m[NO][0m[93m[NO][0m.......   .......[92m[OKAY][0m....... 
 [92m[OKAY][0m[92m[OKAY][0msparse_attn

 transformer............  ............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
transformersparse_attnstochastic_transformersparse_attn   ............ ............. ............  [93m[NO][0m[93m[NO][0m [93m[NO][0m[93m[NO][0m   ....... .....................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


transformertransformerstochastic_transformer   .........................   [93m[NO][0m[93m[NO][0m[93m[NO][0m   .....................   [92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m

stochastic_transformer . stochastic_transformer[93m[NO][0m  ........  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
ninjaninjaninjaninja    .................. ......................................................[92m[OKAY][0m   
[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m--------------------------------------------------


------------------------------------------------------------------------------------------------------------------------------------------------------
op name

**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
 op nameop name ................op name  ................ ................installed................    installed..installedinstalled   .. compatible.. ..
 compatible-------------------------------------------------- 
compatiblecompatible


------------------------------------------------------------------------------------------------------------------------------------------------------


cpu_adam ............... [92m[YES][0m ......cpu_adam  cpu_adam[92m[OKAY][0mcpu_adam............... 
  ..............................[92m[YES][0m   [92m[YES][0m ......[92m[YES][0m......   [92m[OKAY][0mfused_adam......
[92m[OKAY][0m  
.............[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. fused_adamfused_lamb[93m[NO][0m   fused_adam.......................... .......   .............[93m[NO][0m[92m[OKAY][0m 
[93m[NO][0m ....... [93m[NO][0m fused_lamb.......[92m[OKAY][0m  
............. ....... [92m[OKAY][0m [93m[NO][0m
 [92m[OKAY][0m.......
 fused_lamb[92m[OKAY][0m 
/bin/sh: line 0: type: git: not found
fused_lamb.............  sparse_attn.............[93m[NO][0m   ............[93m[NO][0m.......   [93m[NO][0m .......[92m[OKAY][0m.......sparse_attn 
  [92m[OKAY][0m[92m[OKAY][0m............

 [93m[NO][0m transformer.......  ............[92m[OKAY][0m 
[93m[NO][0m sparse_attn.......  [92m[OKAY][0mtransformer............
 sparse_attn ............ [93m[NO][0m ............stochastic_transformer [93m[NO][0m   [93m[NO][0m.............. .   .......[92m[OKAY][0m[92m[OKAY][0m[93m[NO][0m 

 [92m[OKAY][0m.......
stochastic_transformer transformer [92m[OKAY][0mtransformer 
. ............ ............ [93m[NO][0m [93m[NO][0m [93m[NO][0m.......   ..............[92m[OKAY][0m  
[92m[OKAY][0m[92m[OKAY][0m

stochastic_transformerstochastic_transformer  ..  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 
.................... 1.8.1torch version
 ....................torch cuda version  1.8.1...............
 11.1torch cuda version
 nvcc version...............  .....................11.1 
11.2nvcc version
 deepspeed install path.....................  ...........11.2 
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed install path
DeepSpeed general environment info:
 ...........deepspeed info  ................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']0.4.2+bc17042, bc17042, big-science

deepspeed infodeepspeed wheel compiled w.  .........................  0.4.2+bc17042, bc17042, big-sciencetorch 1.8, cuda 11.1

deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1DeepSpeed general environment info:
torch cuda version 
............... 11.1
nvcc versiontorch install path .....................  11.2...............
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']deepspeed info
 ................... torch version0.4.2+bc17042, bc17042, big-science 
....................deepspeed wheel compiled w.  1.8.1......
 torch 1.8, cuda 11.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.async_io ...............
 [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
async_io ............... [93m[NO][0m utils.......  ..................[93m[NO][0m 
[92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0mtransformer_inference  .........  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m--------------------------------------------------

utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda version torch cuda version...............  ...............11.1 
11.1nvcc version
 .....................nvcc version  11.2.....................
 deepspeed install path11.2 
...........deepspeed install path  ...........['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
/bin/sh: line 0: type: git: not found
...................deepspeed info  ...................0.4.2+bc17042, bc17042, big-science
 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w.
 ......deepspeed wheel compiled w.  torch 1.8, cuda 11.1......
 torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path ...............  ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version torch version....................  ....................1.8.1 
1.8.1
torch cuda version torch cuda version...............  ...............11.1
 nvcc version11.1 
.....................nvcc version  11.2.....................
 deepspeed install path11.2 
...........deepspeed install path  ...........['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
 ...................deepspeed info  0.4.2+bc17042, bc17042, big-science...................
 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w.
 ......deepspeed wheel compiled w.  torch 1.8, cuda 11.1......
 torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io async_io...............  [93m[NO][0m...............  .......[93m[NO][0m  [93m[NO][0m.......
 [93m[NO][0m
transformer_inferencetransformer_inference  .. [93m[NO][0m .........  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
utils .................. [92m[YES][0m utils......  ..................[92m[OKAY][0m 
[92m[YES][0m ...... [92m[OKAY][0mquantizer
 .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inferenceutils  ....................  [93m[NO][0m[92m[YES][0m  .............  [92m[OKAY][0m[92m[OKAY][0m

quantizer utils..............  ..................[93m[NO][0m  [92m[YES][0m.......  ......[92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
/bin/sh: line 0: type: git: not found
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
------------------------------------------------------------------------------------------------------------------------------------------------------

JIT compiled ops requires ninja
DeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------

DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja--------------------------------------------------


JIT compiled ops requires ninja--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


--------------------------------------------------
--------------------------------------------------

----------------------------------------------------------------------------------------------------op name

op name op nameop name  ................................ ................  ................installed  installed installedinstalled ..  .... ..  compatiblecompatiblecompatible

 
----------------------------------------------------------------------------------------------------compatible--------------------------------------------------


--------------------------------------------------
cpu_adam cpu_adamcpu_adam...............cpu_adam   ............... ...............[92m[YES][0m...............    ......[92m[YES][0m[92m[YES][0m[92m[YES][0m   [92m[OKAY][0m ......
............   [92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m

fused_adam ............. fused_adamfused_adam[93m[NO][0m  fused_adam ....................   ..........................[93m[NO][0m [92m[OKAY][0m  [93m[NO][0m
/bin/sh: line 0: type: git: not found
[93m[NO][0m.......  ....... ....... fused_lamb [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m
.............

 [93m[NO][0m .......fused_lambfused_lamb  fused_lamb [92m[OKAY][0m .............
............. ............. [93m[NO][0m [93m[NO][0m [93m[NO][0m .......  ..............[92m[OKAY][0m  
[92m[OKAY][0m[92m[OKAY][0m

sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn transformersparse_attn............sparse_attn    [93m[NO][0m....................................    [93m[NO][0m.......[93m[NO][0m[93m[NO][0m   ....... .......[92m[OKAY][0m .......
 [92m[OKAY][0m 
[92m[OKAY][0mtransformer[92m[OKAY][0m
 
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
............ transformer[93m[NO][0mstochastic_transformer  transformer ...................  .  ............[93m[NO][0m[92m[OKAY][0m[93m[NO][0m  
 [93m[NO][0m..............   .......stochastic_transformer[92m[OKAY][0m 
[92m[OKAY][0m [92m[OKAY][0m

. [93m[NO][0mstochastic_transformer stochastic_transformer .......  .[92m[OKAY][0m .
[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io [93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`................ [93m[NO][0m
 ....... [93m[NO][0m
async_io ............... [93m[NO][0m transformer_inference.......  ..[93m[NO][0m 
async_io[93m[NO][0m  ......................  [92m[OKAY][0m[93m[NO][0m
 ....... [93m[NO][0m
transformer_inference utils..  ..................[93m[NO][0m  [92m[YES][0m.......  ......[92m[OKAY][0m 
[92m[OKAY][0m
transformer_inference quantizer..  ..............[93m[NO][0mutils   [93m[NO][0m.........................   .......[92m[OKAY][0m[92m[YES][0m 
 [92m[OKAY][0m......
 [92m[OKAY][0m
--------------------------------------------------utils
 quantizer..................  ..............[92m[YES][0m  [93m[NO][0m......  .......[92m[OKAY][0m 
[92m[OKAY][0m
quantizer ..............-------------------------------------------------- [93m[NO][0m
 ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1DeepSpeed general environment info:

torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
torch version .................... 1.8.1
torch cuda version ............... 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
/bin/sh: line 0: type: git: not found
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-scienceDeepSpeed general environment info:
deepspeed wheel compiled w. 
...... torch 1.8, cuda 11.1
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
torch version .................... 1.8.1
async_io ............... [93m[NO][0m ....... [93m[NO][0m
torch cuda version ............... 11.1
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
nvcc version ..................... 11.2
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
DeepSpeed general environment info:torch install path
 ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 
.................... 1.8.1torch version
 .................... torch cuda version1.8.1 
............... 11.1torch cuda version
 nvcc version...............  .....................11.1 
11.2nvcc version
 deepspeed install path ................................  11.2
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed install pathdeepspeed info  ..............................  0.4.2+bc17042, bc17042, big-science['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed wheel compiled w.deepspeed info  .........................  0.4.2+bc17042, bc17042, big-sciencetorch 1.8, cuda 11.1

deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... [93m[NO][0m async_io.......  [93m[NO][0m...............
 [93m[NO][0m ....... [93m[NO][0m
transformer_inference ..transformer_inference  [93m[NO][0m..  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
utils utils..................  ..................[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
quantizer quantizer..............  ..............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
----------------------------------------------------------------------------------------------------

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
async_io ............... [93m[NO][0m ....... [93m[NO][0m
DeepSpeed general environment info:
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
transformer_inference utils..  ..................[93m[NO][0m  [92m[YES][0m.......  ......[92m[OKAY][0m 
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
[92m[OKAY][0m
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
quantizer utils..............  [93m[NO][0m..................  .......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
--------------------------------------------------
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
/bin/sh: line 0: type: git: not found
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
torch version .................... 1.8.1
async_io ............... [93m[NO][0m ....... [93m[NO][0m
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inference transformer_inference..  ..[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
utils utils..................  ..................[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... [93m[NO][0m .......async_io [93m[NO][0m 
............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m .......transformer_inference  [92m[OKAY][0m..
 [93m[NO][0m ....... [92m[OKAY][0mutils
 .................. [92m[YES][0m ...... [92m[OKAY][0mutils
 .................. [92m[YES][0m quantizer......  [92m[OKAY][0m..............
 [93m[NO][0m .......quantizer  [92m[OKAY][0m..............
 [93m[NO][0m ....... --------------------------------------------------[92m[OKAY][0m

--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
--------------------------------------------------
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
using world size: 256, data-parallel-size: 8, tensor-model-parallel size: 4, pipeline-model-parallel size: 8 
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
using torch.float16 for parameters ...
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
------------------------ arguments ------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
  accumulate_allreduce_grads_in_fp32 .............. False
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
  adam_beta1 ...................................... 0.9
  adam_beta2 ...................................... 0.999
  adam_eps ........................................ 1e-08
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
  adlr_autoresume ................................. False
  adlr_autoresume_interval ........................ 1000
  apply_query_key_layer_scaling ................... True
  apply_residual_connection_post_layernorm ........ False
  attention_dropout ............................... 0.1
  attention_softmax_in_fp32 ....................... False
  bert_binary_head ................................ True
  bert_load ....................................... None
  bf16 ............................................ False
  bias_dropout_fusion ............................. True
  bias_gelu_fusion ................................ True
  biencoder_projection_dim ........................ 0
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
  biencoder_shared_query_context_model ............ False
  block_data_path ................................. None
  checkpoint_activations .......................... True
--------------------------------------------------
  checkpoint_in_cpu ............................... False
  checkpoint_num_layers ........................... 1
  clip_grad ....................................... 1.0
  codecarbon_dir .................................. /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/tr8-104B-logs/codecarbon
  consumed_train_samples .......................... 0
  consumed_valid_samples .......................... 0
  contigious_checkpointing ........................ False
  cpu_optimizer ................................... False
  cpu_torch_adam .................................. False
  data_impl ....................................... mmap
  data_parallel_size .............................. 8
  data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document']
  dataloader_type ................................. single
  DDP_impl ........................................ local
  decoder_seq_length .............................. None
  deepscale ....................................... False
  deepscale_config ................................ None
  deepspeed ....................................... True
  deepspeed_activation_checkpointing .............. True
  deepspeed_config ................................ ./ds_config.1271196.json
  deepspeed_mpi ................................... False
  distribute_checkpointed_activations ............. False
  distributed_backend ............................. nccl
  embedding_path .................................. None
  encoder_seq_length .............................. 2048
  eod_mask_loss ................................... False
  eval_interval ................................... 1000
  eval_iters ...................................... 5
  evidence_data_path .............................. None
  exit_duration_in_mins ........................... 1190
  exit_interval ................................... None
  ffn_hidden_size ................................. 20480
  finetune ........................................ False
  fp16 ............................................ True
  fp16_lm_cross_entropy ........................... False
  fp32_residual_connection ........................ False
  global_batch_size ............................... 2048
  hidden_dropout .................................. 0.1
  hidden_size ..................................... 16384
  hysteresis ...................................... 2
  ict_head_size ................................... None
  ict_load ........................................ None
  img_dim ......................................... 224
  indexer_batch_size .............................. 128
  indexer_log_interval ............................ 1000
  init_method_std ................................. 0.02
  init_method_xavier_uniform ...................... False
  initial_loss_scale .............................. 4294967296
  kv_channels ..................................... 512
  layernorm_epsilon ............................... 1e-05
  lazy_mpu_init ................................... None
  load ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints
  local_rank ...................................... 0
  log_batch_size_to_tensorboard ................... True
  log_interval .................................... 10
  log_learning_rate_to_tensorboard ................ True
  log_loss_scale_to_tensorboard ................... True
  log_num_zeros_in_grad ........................... False
  log_params_norm ................................. False
  log_timers_to_tensorboard ....................... True
  log_validation_ppl_to_tensorboard ............... True
  loss_scale ...................................... 12.0
  loss_scale_window ............................... 1000
  lr .............................................. 6e-05
  lr_decay_iters .................................. None
  lr_decay_samples ................................ 126953125
  lr_decay_style .................................. cosine
  lr_warmup_fraction .............................. None
  lr_warmup_iters ................................. 0
  lr_warmup_samples ............................... 216320
  make_vocab_size_divisible_by .................... 128
  mask_prob ....................................... 0.15
  masked_softmax_fusion ........................... True
  max_position_embeddings ......................... 2048
  memory_centric_tiled_linear ..................... False
  merge_file ...................................... /gpfswork/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/data/gpt2-merges.txt
  micro_batch_size ................................ 1
  min_loss_scale .................................. 1.0
  min_lr .......................................... 6e-06
  mmap_warmup ..................................... False
  no_load_optim ................................... None
  no_load_rng ..................................... None
  no_save_optim ................................... None
  no_save_rng ..................................... None
  num_attention_heads ............................. 32
  num_channels .................................... 3
  num_classes ..................................... 1000
  num_layers ...................................... 32
  num_layers_per_virtual_pipeline_stage ........... None
  num_workers ..................................... 2
  onnx_safe ....................................... None
  openai_gelu ..................................... False
  optimizer ....................................... adam
  override_lr_scheduler ........................... False
  params_dtype .................................... torch.float16
  partition_activations ........................... False
  patch_dim ....................................... 16
  pipeline_model_parallel_size .................... 8
  position_embedding_type ......................... PositionEmbeddingType.absolute
  profile_backward ................................ False
  query_in_block_prob ............................. 0.1
  rampup_batch_size ............................... ['16', '16', '6_000_000']
  rank ............................................ 0
  remote_device ................................... none
  reset_attention_mask ............................ False
  reset_position_ids .............................. False
  retriever_report_topk_accuracies ................ []
  retriever_score_scaling ......................... False
  retriever_seq_length ............................ 256
  sample_rate ..................................... 1.0
  save ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints
  save_interval ................................... 1500
  scatter_gather_tensors_in_pipeline .............. True
  scattered_embeddings ............................ False
  seed ............................................ 43
  seq_length ...................................... 2048
  sgd_momentum .................................... 0.9
  short_seq_prob .................................. 0.1
  split ........................................... 949,50,1
  split_transformers .............................. False
  synchronize_each_layer .......................... False
  tensor_model_parallel_size ...................... 4
  tensorboard_dir ................................. /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/tr8-104B-logs/tensorboard
  tensorboard_log_interval ........................ 1
  tensorboard_queue_size .......................... 5
  tile_factor ..................................... 1
  titles_data_path ................................ None
  tokenizer_name_or_path .......................... None
  tokenizer_type .................................. GPT2BPETokenizer
  train_iters ..................................... None
  train_samples ................................... 300000000
  use_checkpoint_lr_scheduler ..................... False
  use_contiguous_buffers_in_ddp ................... False
  use_cpu_initialization .......................... None
  use_one_sent_docs ............................... False
  use_pin_memory .................................. False
  virtual_pipeline_model_parallel_size ............ None
  vocab_extra_ids ................................. 0
  vocab_file ...................................... /gpfswork/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/data/gpt2-vocab.json
  weight_decay .................................... 0.1
  world_size ...................................... 256
  zero_allgather_bucket_size ...................... 0.0
  zero_contigious_gradients ....................... False
  zero_reduce_bucket_size ......................... 0.0
  zero_reduce_scatter ............................. False
  zero_stage ...................................... 1
-------------------- end of arguments ---------------------
will use batch size rampup starting from global batch size 16 to global batch size 2048 with batch size increments 16 over 6000000 samples.
> building GPT2BPETokenizer tokenizer ...
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:DeepSpeed general environment info:

/bin/sh: line 0: type: git: not found
torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:DeepSpeed general environment info:

torch install path ...............torch install path  ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version torch version .................... 1.8.1....................
 1.8.1torch cuda version
 ............... torch cuda version11.1 
...............nvcc version  11.1.....................
 nvcc version11.2 
.....................deepspeed install path  11.2...........
 deepspeed install path['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info  ..............................  0.4.2+bc17042, bc17042, big-science
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed wheel compiled w.
 deepspeed info......  ...................torch 1.8, cuda 11.1 0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install pathDeepSpeed general environment info: ............... 
torch install path['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
 ...............torch version  .................... 1.8.1
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch cuda version
 ...............torch version  11.1....................
 nvcc version1.8.1 
..................... torch cuda version11.2 
...............deepspeed install path  11.1...........
 nvcc version ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'].....................
 deepspeed info11.2 
...................deepspeed install path  0.4.2+bc17042, bc17042, big-science...........
 deepspeed wheel compiled w.['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
...... deepspeed infotorch 1.8, cuda 11.1 ...................
 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version DeepSpeed general environment info:............... 11.1

nvcc version ..................... 11.2torch install path
deepspeed install path  ..........................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ...................['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 
0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w.torch version  ..........................  torch 1.8, cuda 11.11.8.1

DeepSpeed general environment info:torch cuda version ...............
 11.1
nvcc version .....................torch install path  11.2...............
 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']deepspeed info
 ................... torch version0.4.2+bc17042, bc17042, big-science 
.................... deepspeed wheel compiled w.1.8.1 
...... torch cuda versiontorch 1.8, cuda 11.1 
............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
DeepSpeed general environment info:
DeepSpeed general environment info:
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']....................
 1.8.1
torch version torch cuda version....................  ...............1.8.1 
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
11.1torch cuda version
 nvcc version...............  .....................11.1 
11.2nvcc version
 deepspeed install path.....................  ...........11.2 
deepspeed install path['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
...........deepspeed info  ...................['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
0.4.2+bc17042, bc17042, big-sciencedeepspeed info
 deepspeed wheel compiled w....................  ......0.4.2+bc17042, bc17042, big-science
 torch 1.8, cuda 11.1deepspeed wheel compiled w.
 ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inference transformer_inference..  ..[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
utils utils..................  ..................[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
quantizer quantizer..............  ..............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install pathDeepSpeed general environment info: ............... 
torch install path['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 
............... torch version .................... 1.8.1
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch cuda version ...............torch version  11.1....................
 nvcc version1.8.1 
..................... 11.2torch cuda version
 deepspeed install path...............  ...........11.1 
nvcc version['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
.....................deepspeed info  11.2...................
 deepspeed install path0.4.2+bc17042, bc17042, big-science 
...........deepspeed wheel compiled w.  ......['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
torch 1.8, cuda 11.1
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... [93m[NO][0masync_io .......  ...............[93m[NO][0m 
[93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m transformer_inference.......  ..[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m utils......  ..................[92m[OKAY][0m 
[92m[YES][0m ......quantizer  [92m[OKAY][0m..............
 [93m[NO][0m quantizer.......  ..............[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m--------------------------------------------------

--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ...............async_io  [93m[NO][0m...............  .......[93m[NO][0m  [93m[NO][0m.......
 [93m[NO][0m
transformer_inference .. [93m[NO][0mtransformer_inference  .........  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
utils .................. [92m[YES][0m utils......  ..................[92m[OKAY][0m 
[92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0mquantizer  .....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info ...................  ................... 0.4.2+bc17042, bc17042, big-science
0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w.
 deepspeed wheel compiled w.......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

--------------------------------------------------
----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------
DeepSpeed C++/CUDA extension op report


DeepSpeed C++/CUDA extension op report--------------------------------------------------JIT compiled ops requires ninja--------------------------------------------------


--------------------------------------------------JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

--------------------------------------------------JIT compiled ops requires ninja

JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']DeepSpeed general environment info:

torch version .................... 1.8.1
torch install path torch cuda version...............  ............... 11.1
nvcc version ..................... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']11.2

deepspeed install path torch version...........  ....................['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
1.8.1
deepspeed info ...................torch cuda version  0.4.2+bc17042, bc17042, big-science
............... deepspeed wheel compiled w.11.1 
...... nvcc versiontorch 1.8, cuda 11.1 
..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
ninjaninjaninja ninja  ......................................................   .................. [92m[OKAY][0m[92m[OKAY][0m 

[92m[OKAY][0m--------------------------------------------------[92m[OKAY][0m--------------------------------------------------


----------------------------------------------------------------------------------------------------op nameop name
 
op name ................op name ................   ................................installedinstalled    installedinstalled.. .. ..   ..compatiblecompatiblecompatible 

compatible--------------------------------------------------

--------------------------------------------------
--------------------------------------------------
--------------------------------------------------

cpu_adam ...............cpu_adam cpu_adam [92m[YES][0m cpu_adam............... ...............  ...... ...............[92m[YES][0m[92m[YES][0m [92m[OKAY][0m  
[92m[YES][0m ............   [92m[OKAY][0m......[92m[OKAY][0m
 
fused_adam[92m[OKAY][0m .............
 [93m[NO][0m ....... fused_adam[92m[OKAY][0m 
fused_adam.............  .............fused_adam fused_lamb .............[93m[NO][0m[93m[NO][0m  .............  [93m[NO][0m .............. [93m[NO][0m ....... [92m[OKAY][0m [92m[OKAY][0m .......

[92m[OKAY][0m 
fused_lamb[92m[OKAY][0mfused_lamb 
............. fused_lamb ............. [93m[NO][0m ............. [93m[NO][0m  .......[93m[NO][0m.......  sparse_attn .......[92m[OKAY][0m[92m[OKAY][0m  

[92m[OKAY][0m............
 [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0msparse_attnsparse_attn   sparse_attn...............................    ............[93m[NO][0m [93m[NO][0m[92m[OKAY][0m [93m[NO][0m ....... 
.......stochastic_transformer .......   [92m[OKAY][0m[92m[OKAY][0m.
[92m[OKAY][0m
 
transformer[93m[NO][0m transformer ...................  transformer[93m[NO][0m............    [92m[OKAY][0m............
.......[93m[NO][0m   [93m[NO][0m[92m[OKAY][0m....... 
 .......[92m[OKAY][0m 
stochastic_transformer[92m[OKAY][0m 
.stochastic_transformer  stochastic_transformer[93m[NO][0m  ........ . [93m[NO][0m [92m[OKAY][0m 
.......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

DeepSpeed general environment info:
DeepSpeed general environment info:torch install path
 ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 
.................... 1.8.1torch version
 ....................torch cuda version  1.8.1...............
 11.1
torch cuda versionnvcc version ............... 11.1
nvcc version ..................... 11.2 
.....................deepspeed install path  11.2...........
 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ...................deepspeed install path  0.4.2+bc17042, bc17042, big-science...........
 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer ..............[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [93m[NO][0m 
....... [92m[OKAY][0m
--------------------------------------------------[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0mtransformer_inference  .........  [93m[NO][0m[93m[NO][0m 
....... [92m[OKAY][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
utils .................. transformer_inference[92m[YES][0m  ........  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m .......utils  [92m[OKAY][0m..................
 [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1DeepSpeed general environment info:
nvcc version 
..................... 11.2
DeepSpeed general environment info:
deepspeed install pathtorch install path ...........  ............... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
deepspeed info ................... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
0.4.2+bc17042, bc17042, big-science
torch versiondeepspeed wheel compiled w.  ..........................  1.8.1torch 1.8, cuda 11.1

torch version .................... 1.8.1
torch cuda version ............... 11.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
DeepSpeed general environment info:torch install path
 ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']....................
 1.8.1
torch version torch cuda version....................  ...............1.8.1 
11.1
nvcc versiontorch cuda version  ....................................  11.211.1

deepspeed install pathnvcc version  ................................  11.2['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed install pathdeepspeed info  ..............................  0.4.2+bc17042, bc17042, big-science['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed wheel compiled w.deepspeed info  .........................  torch 1.8, cuda 11.10.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... DeepSpeed general environment info:['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch version .................... 1.8.1torch install path
 torch cuda version...............  ............... 11.1
nvcc version ..................... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']11.2

deepspeed install pathtorch version  ...............................  1.8.1['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infotorch cuda version  ..................................  0.4.2+bc17042, bc17042, big-science11.1

deepspeed wheel compiled w.nvcc version  ...........................  torch 1.8, cuda 11.111.2

deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
DeepSpeed general environment info:
torch version .................... 1.8.1
torch cuda version ............... 11.1
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
DeepSpeed general environment info:
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... DeepSpeed general environment info:['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info 
................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w.torch install path ......  torch 1.8, cuda 11.1...............
 ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:DeepSpeed general environment info:DeepSpeed general environment info:


torch install pathtorch install pathtorch install path  ............... ............... ...............  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']


torch versiontorch version  torch version........................................   ....................1.8.11.8.1 

1.8.1
torch cuda versiontorch cuda version torch cuda version ............... ............... ............... 11.1 11.1
11.1

nvcc versionnvcc version nvcc version ..................... ..................... ..................... 11.2 11.2
11.2
deepspeed install path
deepspeed install path deepspeed install path ........... ........... ...........  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info

 deepspeed infodeepspeed info...................   ......................................0.4.2+bc17042, bc17042, big-science  
0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w.

 deepspeed wheel compiled w.deepspeed wheel compiled w.......   ............torch 1.8, cuda 11.1  
torch 1.8, cuda 11.1torch 1.8, cuda 11.1

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc versionDeepSpeed general environment info: ..................... 11.2

deepspeed install path ........... torch install path['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
...............deepspeed info  ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w.['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 
...... torch 1.8, cuda 11.1
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path ...............  ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version torch version....................  ....................1.8.1 
1.8.1
torch cuda version torch cuda version...............  ...............11.1 
11.1nvcc version
 nvcc version.....................  .....................11.2 
11.2deepspeed install path
 ...........deepspeed install path  ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info
 ...................deepspeed info  ...................0.4.2+bc17042, bc17042, big-science
 deepspeed wheel compiled w.0.4.2+bc17042, bc17042, big-science 
......deepspeed wheel compiled w.  torch 1.8, cuda 11.1......
 torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
------------------------------------------------------------------------------------------------------------------------------------------------------


DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

DeepSpeed C++/CUDA extension op report----------------------------------------------------------------------------------------------------


--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------

--------------------------------------------------
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
DeepSpeed C++/CUDA extension op report
JIT compiled ops requires ninja

JIT compiled ops requires ninja
--------------------------------------------------
--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja

--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path ...............  ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed info deepspeed info...................  ...................0.4.2+bc17042, bc17042, big-science 
0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. deepspeed wheel compiled w.......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

ninjaninjaninja  ninja ......................................................   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m 


..................---------------------------------------------------------------------------------------------------- --------------------------------------------------

[92m[OKAY][0m

op nameop name op name ................-------------------------------------------------- ................ ................
 installed op nameinstalled installed   .................. .... compatible  installed
compatiblecompatible 
..--------------------------------------------------

----------------------------------------------------------------------------------------------------

cpu_adam  cpu_adam...............compatiblecpu_adam
   ...............--------------------------------------------------...............[92m[YES][0m  ......[92m[YES][0m 
  [92m[OKAY][0m......[92m[YES][0m
  [92m[OKAY][0m......
 [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ......fused_adam  [92m[OKAY][0m.............fused_adamfused_adam
   .............[93m[NO][0m.............   .......[93m[NO][0m[93m[NO][0m   [92m[OKAY][0m.......
.......  [92m[OKAY][0m[92m[OKAY][0m
fused_adamfused_lamb
  .......................... fused_lamb [93m[NO][0m[93m[NO][0m  fused_lamb .............  ...........................[93m[NO][0m    [92m[OKAY][0m[92m[OKAY][0m.......[93m[NO][0m
 .......
  [92m[OKAY][0m[92m[OKAY][0m
fused_lamb
 ............. [93m[NO][0m ....... sparse_attn[92m[OKAY][0m 
............ [93m[NO][0m .......sparse_attn sparse_attn ............ [92m[OKAY][0m............
  [93m[NO][0m[93m[NO][0mtransformer   sparse_attn..........................    [92m[OKAY][0m[93m[NO][0m[92m[OKAY][0m
 ............
.......transformer   transformer[92m[OKAY][0m[93m[NO][0m............ 
 ............ [93m[NO][0m  .......[93m[NO][0mstochastic_transformer.......    ........[92m[OKAY][0m  
[92m[OKAY][0m[93m[NO][0m
[92m[OKAY][0m 
.......stochastic_transformer  [92m[OKAY][0mtransformer
stochastic_transformer . ............  .[93m[NO][0m  .......[93m[NO][0m[93m[NO][0m   [92m[OKAY][0m..............
 [92m[OKAY][0m
 [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utils .................. [92m[YES][0mutils  ........................  [92m[OKAY][0m[92m[YES][0m
 ...... [92m[OKAY][0m
quantizer ..............quantizer  [93m[NO][0m..............  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
DeepSpeed general environment info:nvcc version ..................... 
11.2
deepspeed install path ...........torch install path  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']...............
 deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']......
 torch 1.8, cuda 11.1
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0masync_io  ....... ...............[93m[NO][0m 
[93m[NO][0m ....... [93m[NO][0m
transformer_inference transformer_inference..  ..[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
utils ..................utils  [92m[YES][0m..................  ......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
quantizer ..............quantizer  [93m[NO][0m..............  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
 > padded vocab (size: 50257) with 431 dummy tokens (new size: 50688)
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op report

--------------------------------------------------DeepSpeed C++/CUDA extension op report--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------

DeepSpeed C++/CUDA extension op report--------------------------------------------------
--------------------------------------------------
JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


----------------------------------------------------------------------------------------------------
DeepSpeed C++/CUDA extension op report
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
JIT compiled ops requires ninja

----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------
JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
ninjaninjaninjaninja   ....................................  .................. ..................[92m[OKAY][0m [92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m


----------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------


op nameop name op nameop name ................................    installed................................installed    ..installedinstalled..    compatible....
compatible  
compatible--------------------------------------------------compatible
--------------------------------------------------


----------------------------------------------------------------------------------------------------

cpu_adam cpu_adam............... cpu_adam cpu_adam...............[92m[YES][0m    ..............................[92m[YES][0m......    [92m[YES][0m[92m[OKAY][0m[92m[YES][0m......  
 ............[92m[OKAY][0m  
[92m[OKAY][0m[92m[OKAY][0m

fused_adam .............fused_adamfused_adam   [93m[NO][0m.............fused_adam.............    .......[93m[NO][0m.............[93m[NO][0m    [92m[OKAY][0m[93m[NO][0m.......
 ....... ....... [92m[OKAY][0m [92m[OKAY][0m
fused_lamb[92m[OKAY][0m
 
............. fused_lamb[93m[NO][0mfused_lamb  fused_lamb .......................... .......   .............[93m[NO][0m[93m[NO][0m [92m[OKAY][0m  [93m[NO][0m
..............   .......[92m[OKAY][0m[92m[OKAY][0m 

[92m[OKAY][0m
sparse_attn ............ [93m[NO][0m sparse_attnsparse_attn....... sparse_attn  ............ ............[92m[OKAY][0m ............ 
 [93m[NO][0m[93m[NO][0m[93m[NO][0m   transformer.....................    ............[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m 


[93m[NO][0m .......transformer transformer [92m[OKAY][0m ............transformer
............   [93m[NO][0m[93m[NO][0m............  stochastic_transformer....... .......  [93m[NO][0m [92m[OKAY][0m [92m[OKAY][0m
........
  [93m[NO][0m[92m[OKAY][0m
 stochastic_transformer.......stochastic_transformer   [92m[OKAY][0mstochastic_transformer
. .  [93m[NO][0m[93m[NO][0m . ....... ....... [93m[NO][0m [92m[OKAY][0m[92m[OKAY][0m 

....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version DeepSpeed general environment info:............... 11.1

nvcc version ..................... 11.2torch install path
 deepspeed install path...............  ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ...................
 0.4.2+bc17042, bc17042, big-science
torch versiondeepspeed wheel compiled w.  ..........................  1.8.1torch 1.8, cuda 11.1

torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io async_io...............  ...............[93m[NO][0m  [93m[NO][0m.......  .......[93m[NO][0m 
[93m[NO][0m
transformer_inference .. transformer_inference[93m[NO][0m  .........  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... utils[92m[OKAY][0m 
.................. [92m[YES][0m quantizer......  ..............[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0mquantizer
 .............. [93m[NO][0m --------------------------------------------------.......
 [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
----------------------------------------------------------------------------------------------------
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


--------------------------------------------------DeepSpeed C++/CUDA extension op report--------------------------------------------------
DeepSpeed C++/CUDA extension op report


--------------------------------------------------JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report

--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------JIT compiled ops requires ninja


JIT compiled ops requires ninja--------------------------------------------------

JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:DeepSpeed general environment info:

torch install path ...............torch install path  ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version ....................torch version  1.8.1....................
 1.8.1torch cuda version
 ...............torch cuda version  11.1...............
 nvcc version11.1 
.....................nvcc version  11.2.....................
 deepspeed install path11.2 
........... deepspeed install path ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']...........
 deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
................... deepspeed info 0.4.2+bc17042, bc17042, big-science...................
 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w.
 ......deepspeed wheel compiled w.  torch 1.8, cuda 11.1......
 torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
ninjaninjaninjaninja   .................................... ..................  .................. [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


op nameop name op nameop name ................   ................................................installed   installed installedinstalled  .... .. ..compatible   
compatiblecompatible
compatible--------------------------------------------------
--------------------------------------------------


----------------------------------------------------------------------------------------------------

cpu_adamcpu_adam  cpu_adam...............cpu_adam...............    ...............[92m[YES][0m...............[92m[YES][0m   [92m[YES][0m[92m[YES][0m   ........................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m


fused_adam fused_adam.............fused_adamfused_adam   [93m[NO][0m .......................................   ....... [93m[NO][0m [93m[NO][0m[93m[NO][0m   [92m[OKAY][0m.....................
   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0mfused_lamb


 .............fused_lamb fused_lambfused_lamb [93m[NO][0m   ..............................................    [93m[NO][0m[92m[OKAY][0m [93m[NO][0m[93m[NO][0m.......
   .......[92m[OKAY][0m....... 
 [92m[OKAY][0m
[92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn sparse_attntransformer............sparse_attn    ........................[93m[NO][0m............   ....... [93m[NO][0m[93m[NO][0m[93m[NO][0m    [92m[OKAY][0m..............
 ....... [92m[OKAY][0m [92m[OKAY][0m
transformer[92m[OKAY][0m
 
............transformerstochastic_transformer transformer [93m[NO][0m............    ....................[93m[NO][0m    [93m[NO][0m[92m[OKAY][0m.......[93m[NO][0m 
 ....... [92m[OKAY][0m .......
[92m[OKAY][0m 
[92m[OKAY][0mstochastic_transformer
 stochastic_transformer .stochastic_transformer  .[93m[NO][0m . .......[93m[NO][0m   [93m[NO][0m[92m[OKAY][0m....... 
 .......[92m[OKAY][0m 
[92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference ..[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [93m[NO][0m .......
 [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizerasync_io  .............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[93m[NO][0m

--------------------------------------------------
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:DeepSpeed general environment info:

torch install path torch install path...............  ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version torch version....................  ....................1.8.1 
1.8.1
torch cuda version torch cuda version...............  ...............11.1 
11.1nvcc version
 nvcc version.....................  .....................11.2 
11.2
deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. 
...... deepspeed wheel compiled w.torch 1.8, cuda 11.1 ......
 torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

DeepSpeed general environment info:DeepSpeed general environment info:

torch install path ...............torch install path  ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version torch version....................  ....................1.8.1 
1.8.1
torch cuda version torch cuda version...............  ...............11.1 
11.1
nvcc version nvcc version.....................  .....................11.2 
11.2
deepspeed install path deepspeed install path...........  ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info deepspeed info...................  ................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.


async_ioasync_io  ............... ...............async_io[93m[NO][0m  [93m[NO][0m  .............................   [93m[NO][0m[93m[NO][0m[93m[NO][0m
 
....... [93m[NO][0m
transformer_inferencetransformer_inference  ..transformer_inference .. [93m[NO][0m .. [93m[NO][0m ....... [93m[NO][0m ....... [92m[OKAY][0m .......
[92m[OKAY][0m 
[92m[OKAY][0m
utilsutils utils .................. .................. .................. [92m[YES][0m [92m[YES][0m [92m[YES][0m ...... ...... ...... [92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m

quantizerquantizer  quantizer............................   ..............[93m[NO][0m[93m[NO][0m   [93m[NO][0m..............   [92m[OKAY][0m[92m[OKAY][0m.......

 [92m[OKAY][0m
----------------------------------------------------------------------------------------------------

--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... DeepSpeed general environment info:
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch versiontorch install path ....................  ...............1.8.1 
torch cuda version ............... 11.1['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

nvcc version .....................torch version  11.2....................
 deepspeed install path1.8.1 
........... torch cuda version ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']...............
 deepspeed info11.1 
...................nvcc version  .....................0.4.2+bc17042, bc17042, big-science 
11.2
deepspeed wheel compiled w. deepspeed install path......  ...........torch 1.8, cuda 11.1 
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
> setting codecarbon ...
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
----------------------------------------------------------------------------------------------------
JIT compiled ops requires ninja

DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninjaninja  ....................................  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

op nameop name  ................................  installedinstalled  ....  compatiblecompatible

----------------------------------------------------------------------------------------------------

cpu_adamcpu_adam  ..............................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

fused_adamfused_adam  ..........................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

fused_lamb .............fused_lamb  [93m[NO][0m.............  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
sparse_attn ............sparse_attn  [93m[NO][0m............  .......[93m[NO][0m [92m[OKAY][0m 
....... [92m[OKAY][0m
transformertransformer  ........................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

stochastic_transformer stochastic_transformer . [93m[NO][0m.  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0mninja .......  ..................[92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatiblesparse_attn
 --------------------------------------------------............
 [93m[NO][0m ....... [92m[OKAY][0m
transformercpu_adam ............  ...............[93m[NO][0m  [92m[YES][0m.......  ......[92m[OKAY][0m 
[92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... fused_adam[92m[OKAY][0m 
............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inference .. transformer_inference[93m[NO][0m  .........  [93m[NO][0m [92m[OKAY][0m.......
 [92m[OKAY][0m
utils ..................utils  [92m[YES][0m..................  ......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
quantizer .............. [93m[NO][0m .......quantizer  [92m[OKAY][0m..............
 [93m[NO][0m .......-------------------------------------------------- 
[92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  .............. [93m[NO][0m 
[93m[NO][0m
transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... DeepSpeed general environment info:
['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch install path
 ...............torch version  .................... 1.8.1
torch cuda version['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 
............... 11.1
torch version nvcc version....................  .....................1.8.1 
11.2
torch cuda versiondeepspeed install path  ..........................  11.1
['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']nvcc version 
.....................deepspeed info  11.2...................
 deepspeed install path 0.4.2+bc17042, bc17042, big-science...........
 deepspeed wheel compiled w.['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ......
 deepspeed infotorch 1.8, cuda 11.1 
................... 0.4.2+bc17042, bc17042, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
----------------------------------------------------------------------------------------------------

--------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report


----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------


----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


--------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja--------------------------------------------------


DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja

--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
ninjaninjaninjaninja   .................................... ..................   ..................[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m 


[92m[OKAY][0m----------------------------------------------------------------------------------------------------
--------------------------------------------------


op name --------------------------------------------------op nameop name................ 
................  op name installedinstalled................   .. .................. installed compatiblecompatible  
installed
.. ---------------------------------------------------------------------------------------------------- ..

compatible 
compatible
----------------------------------------------------------------------------------------------------

cpu_adamcpu_adam  ..............................  [92m[YES][0m[92m[YES][0m cpu_adamcpu_adam......    .....................[92m[OKAY][0m............... 
  [92m[YES][0m[92m[OKAY][0m[92m[YES][0m 
 ............  [92m[OKAY][0m[92m[OKAY][0m

fused_adam ............. [93m[NO][0m .......fused_adam  [92m[OKAY][0m.............
fused_adam fused_adam [93m[NO][0m  ..........................fused_lamb.......    [92m[OKAY][0m[93m[NO][0m.............[93m[NO][0m
   [93m[NO][0m....... .......fused_lamb.......    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m.............
 

[93m[NO][0m fused_lamb.......fused_lamb   [92m[OKAY][0m..........................
  [93m[NO][0m[93m[NO][0m  ..............  sparse_attn[92m[OKAY][0m[92m[OKAY][0m 

............ [93m[NO][0m ....... sparse_attn[92m[OKAY][0m 
............ [93m[NO][0m transformer.......  ............[92m[OKAY][0msparse_attn
sparse_attn  [93m[NO][0m............  transformer...................    [93m[NO][0m[93m[NO][0m............[92m[OKAY][0m   
..............[93m[NO][0m   [92m[OKAY][0mstochastic_transformer[92m[OKAY][0m
....... 
 [92m[OKAY][0m.
transformer transformer [93m[NO][0m stochastic_transformer ............  .............[93m[NO][0m.......    [93m[NO][0m[93m[NO][0m[92m[OKAY][0m  
.....................   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


stochastic_transformerstochastic_transformer  ..  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utils utils..................  ..................[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

> initializing torch distributed ...
> setting tensorboard ...
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda version torch cuda version...............  ...............11.1 
11.1nvcc version
 nvcc version.....................  11.2.....................
 deepspeed install path11.2 
...........deepspeed install path  ...........['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 
...................deepspeed info  0.4.2+bc17042, bc17042, big-science...................
 deepspeed wheel compiled w.0.4.2+bc17042, bc17042, big-science 
......deepspeed wheel compiled w.  torch 1.8, cuda 11.1......
 torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
> initializing tensor model parallel with size 4
> initializing pipeline model parallel with size 8
> setting random seeds to 43 ...
[2021-09-27 17:43:40,715] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 2761 and data parallel seed: 43
> compiling dataset index builder ...
make: Entering directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/data'
make: Nothing to be done for 'default'.
make: Leaving directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/data'
>>> done with dataset index builder. Compilation time: 0.304 seconds
> compiling and loading fused kernels ...
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
Detected CUDA files, patching ldflags
Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/fused_kernels/build/build.ninja...
Building extension module scaled_upper_triang_masked_softmax_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module scaled_upper_triang_masked_softmax_cuda...
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
Detected CUDA files, patching ldflags
Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/fused_kernels/build/build.ninja...
Building extension module scaled_masked_softmax_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module scaled_masked_softmax_cuda...
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
Detected CUDA files, patching ldflags
Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/fused_kernels/build/build.ninja...
Building extension module fused_mix_prec_layer_norm_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module fused_mix_prec_layer_norm_cuda...
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
>>> done with compiling and loading fused kernels. Compilation time: 22.376 seconds
time to initialize megatron (seconds): 67.410
[after megatron is initialized] datetime: 2021-09-27 17:44:03 
building GPT model ...
[2021-09-27 17:44:03,479] [INFO] [utils.py:680:see_memory_usage] Before Building Model
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/cuda/memory.py:373: FutureWarning: torch.cuda.memory_cached has been renamed to torch.cuda.memory_reserved
  warnings.warn(
/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/cuda/memory.py:381: FutureWarning: torch.cuda.max_memory_cached has been renamed to torch.cuda.max_memory_reserved
  warnings.warn(
[2021-09-27 17:44:03,481] [INFO] [utils.py:681:see_memory_usage] MA 0.0 GB         Max_MA 0.0 GB         CA 0.0 GB         Max_CA 0 GB 
[2021-09-27 17:44:03,481] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory:  used = 37.48 GB, percent = 20.0%
SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None
Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=0, model=1): 1, ProcessCoord(pipe=0, data=0, model=2): 2, ProcessCoord(pipe=0, data=0, model=3): 3, ProcessCoord(pipe=0, data=1, model=0): 4, ProcessCoord(pipe=0, data=1, model=1): 5, ProcessCoord(pipe=0, data=1, model=2): 6, ProcessCoord(pipe=0, data=1, model=3): 7, ProcessCoord(pipe=0, data=2, model=0): 8, ProcessCoord(pipe=0, data=2, model=1): 9, ProcessCoord(pipe=0, data=2, model=2): 10, ProcessCoord(pipe=0, data=2, model=3): 11, ProcessCoord(pipe=0, data=3, model=0): 12, ProcessCoord(pipe=0, data=3, model=1): 13, ProcessCoord(pipe=0, data=3, model=2): 14, ProcessCoord(pipe=0, data=3, model=3): 15, ProcessCoord(pipe=0, data=4, model=0): 16, ProcessCoord(pipe=0, data=4, model=1): 17, ProcessCoord(pipe=0, data=4, model=2): 18, ProcessCoord(pipe=0, data=4, model=3): 19, ProcessCoord(pipe=0, data=5, model=0): 20, ProcessCoord(pipe=0, data=5, model=1): 21, ProcessCoord(pipe=0, data=5, model=2): 22, ProcessCoord(pipe=0, data=5, model=3): 23, ProcessCoord(pipe=0, data=6, model=0): 24, ProcessCoord(pipe=0, data=6, model=1): 25, ProcessCoord(pipe=0, data=6, model=2): 26, ProcessCoord(pipe=0, data=6, model=3): 27, ProcessCoord(pipe=0, data=7, model=0): 28, ProcessCoord(pipe=0, data=7, model=1): 29, ProcessCoord(pipe=0, data=7, model=2): 30, ProcessCoord(pipe=0, data=7, model=3): 31, ProcessCoord(pipe=1, data=0, model=0): 32, ProcessCoord(pipe=1, data=0, model=1): 33, ProcessCoord(pipe=1, data=0, model=2): 34, ProcessCoord(pipe=1, data=0, model=3): 35, ProcessCoord(pipe=1, data=1, model=0): 36, ProcessCoord(pipe=1, data=1, model=1): 37, ProcessCoord(pipe=1, data=1, model=2): 38, ProcessCoord(pipe=1, data=1, model=3): 39, ProcessCoord(pipe=1, data=2, model=0): 40, ProcessCoord(pipe=1, data=2, model=1): 41, ProcessCoord(pipe=1, data=2, model=2): 42, ProcessCoord(pipe=1, data=2, model=3): 43, ProcessCoord(pipe=1, data=3, model=0): 44, ProcessCoord(pipe=1, data=3, model=1): 45, ProcessCoord(pipe=1, data=3, model=2): 46, ProcessCoord(pipe=1, data=3, model=3): 47, ProcessCoord(pipe=1, data=4, model=0): 48, ProcessCoord(pipe=1, data=4, model=1): 49, ProcessCoord(pipe=1, data=4, model=2): 50, ProcessCoord(pipe=1, data=4, model=3): 51, ProcessCoord(pipe=1, data=5, model=0): 52, ProcessCoord(pipe=1, data=5, model=1): 53, ProcessCoord(pipe=1, data=5, model=2): 54, ProcessCoord(pipe=1, data=5, model=3): 55, ProcessCoord(pipe=1, data=6, model=0): 56, ProcessCoord(pipe=1, data=6, model=1): 57, ProcessCoord(pipe=1, data=6, model=2): 58, ProcessCoord(pipe=1, data=6, model=3): 59, ProcessCoord(pipe=1, data=7, model=0): 60, ProcessCoord(pipe=1, data=7, model=1): 61, ProcessCoord(pipe=1, data=7, model=2): 62, ProcessCoord(pipe=1, data=7, model=3): 63, ProcessCoord(pipe=2, data=0, model=0): 64, ProcessCoord(pipe=2, data=0, model=1): 65, ProcessCoord(pipe=2, data=0, model=2): 66, ProcessCoord(pipe=2, data=0, model=3): 67, ProcessCoord(pipe=2, data=1, model=0): 68, ProcessCoord(pipe=2, data=1, model=1): 69, ProcessCoord(pipe=2, data=1, model=2): 70, ProcessCoord(pipe=2, data=1, model=3): 71, ProcessCoord(pipe=2, data=2, model=0): 72, ProcessCoord(pipe=2, data=2, model=1): 73, ProcessCoord(pipe=2, data=2, model=2): 74, ProcessCoord(pipe=2, data=2, model=3): 75, ProcessCoord(pipe=2, data=3, model=0): 76, ProcessCoord(pipe=2, data=3, model=1): 77, ProcessCoord(pipe=2, data=3, model=2): 78, ProcessCoord(pipe=2, data=3, model=3): 79, ProcessCoord(pipe=2, data=4, model=0): 80, ProcessCoord(pipe=2, data=4, model=1): 81, ProcessCoord(pipe=2, data=4, model=2): 82, ProcessCoord(pipe=2, data=4, model=3): 83, ProcessCoord(pipe=2, data=5, model=0): 84, ProcessCoord(pipe=2, data=5, model=1): 85, ProcessCoord(pipe=2, data=5, model=2): 86, ProcessCoord(pipe=2, data=5, model=3): 87, ProcessCoord(pipe=2, data=6, model=0): 88, ProcessCoord(pipe=2, data=6, model=1): 89, ProcessCoord(pipe=2, data=6, model=2): 90, ProcessCoord(pipe=2, data=6, model=3): 91, ProcessCoord(pipe=2, data=7, model=0): 92, ProcessCoord(pipe=2, data=7, model=1): 93, ProcessCoord(pipe=2, data=7, model=2): 94, ProcessCoord(pipe=2, data=7, model=3): 95, ProcessCoord(pipe=3, data=0, model=0): 96, ProcessCoord(pipe=3, data=0, model=1): 97, ProcessCoord(pipe=3, data=0, model=2): 98, ProcessCoord(pipe=3, data=0, model=3): 99, ProcessCoord(pipe=3, data=1, model=0): 100, ProcessCoord(pipe=3, data=1, model=1): 101, ProcessCoord(pipe=3, data=1, model=2): 102, ProcessCoord(pipe=3, data=1, model=3): 103, ProcessCoord(pipe=3, data=2, model=0): 104, ProcessCoord(pipe=3, data=2, model=1): 105, ProcessCoord(pipe=3, data=2, model=2): 106, ProcessCoord(pipe=3, data=2, model=3): 107, ProcessCoord(pipe=3, data=3, model=0): 108, ProcessCoord(pipe=3, data=3, model=1): 109, ProcessCoord(pipe=3, data=3, model=2): 110, ProcessCoord(pipe=3, data=3, model=3): 111, ProcessCoord(pipe=3, data=4, model=0): 112, ProcessCoord(pipe=3, data=4, model=1): 113, ProcessCoord(pipe=3, data=4, model=2): 114, ProcessCoord(pipe=3, data=4, model=3): 115, ProcessCoord(pipe=3, data=5, model=0): 116, ProcessCoord(pipe=3, data=5, model=1): 117, ProcessCoord(pipe=3, data=5, model=2): 118, ProcessCoord(pipe=3, data=5, model=3): 119, ProcessCoord(pipe=3, data=6, model=0): 120, ProcessCoord(pipe=3, data=6, model=1): 121, ProcessCoord(pipe=3, data=6, model=2): 122, ProcessCoord(pipe=3, data=6, model=3): 123, ProcessCoord(pipe=3, data=7, model=0): 124, ProcessCoord(pipe=3, data=7, model=1): 125, ProcessCoord(pipe=3, data=7, model=2): 126, ProcessCoord(pipe=3, data=7, model=3): 127, ProcessCoord(pipe=4, data=0, model=0): 128, ProcessCoord(pipe=4, data=0, model=1): 129, ProcessCoord(pipe=4, data=0, model=2): 130, ProcessCoord(pipe=4, data=0, model=3): 131, ProcessCoord(pipe=4, data=1, model=0): 132, ProcessCoord(pipe=4, data=1, model=1): 133, ProcessCoord(pipe=4, data=1, model=2): 134, ProcessCoord(pipe=4, data=1, model=3): 135, ProcessCoord(pipe=4, data=2, model=0): 136, ProcessCoord(pipe=4, data=2, model=1): 137, ProcessCoord(pipe=4, data=2, model=2): 138, ProcessCoord(pipe=4, data=2, model=3): 139, ProcessCoord(pipe=4, data=3, model=0): 140, ProcessCoord(pipe=4, data=3, model=1): 141, ProcessCoord(pipe=4, data=3, model=2): 142, ProcessCoord(pipe=4, data=3, model=3): 143, ProcessCoord(pipe=4, data=4, model=0): 144, ProcessCoord(pipe=4, data=4, model=1): 145, ProcessCoord(pipe=4, data=4, model=2): 146, ProcessCoord(pipe=4, data=4, model=3): 147, ProcessCoord(pipe=4, data=5, model=0): 148, ProcessCoord(pipe=4, data=5, model=1): 149, ProcessCoord(pipe=4, data=5, model=2): 150, ProcessCoord(pipe=4, data=5, model=3): 151, ProcessCoord(pipe=4, data=6, model=0): 152, ProcessCoord(pipe=4, data=6, model=1): 153, ProcessCoord(pipe=4, data=6, model=2): 154, ProcessCoord(pipe=4, data=6, model=3): 155, ProcessCoord(pipe=4, data=7, model=0): 156, ProcessCoord(pipe=4, data=7, model=1): 157, ProcessCoord(pipe=4, data=7, model=2): 158, ProcessCoord(pipe=4, data=7, model=3): 159, ProcessCoord(pipe=5, data=0, model=0): 160, ProcessCoord(pipe=5, data=0, model=1): 161, ProcessCoord(pipe=5, data=0, model=2): 162, ProcessCoord(pipe=5, data=0, model=3): 163, ProcessCoord(pipe=5, data=1, model=0): 164, ProcessCoord(pipe=5, data=1, model=1): 165, ProcessCoord(pipe=5, data=1, model=2): 166, ProcessCoord(pipe=5, data=1, model=3): 167, ProcessCoord(pipe=5, data=2, model=0): 168, ProcessCoord(pipe=5, data=2, model=1): 169, ProcessCoord(pipe=5, data=2, model=2): 170, ProcessCoord(pipe=5, data=2, model=3): 171, ProcessCoord(pipe=5, data=3, model=0): 172, ProcessCoord(pipe=5, data=3, model=1): 173, ProcessCoord(pipe=5, data=3, model=2): 174, ProcessCoord(pipe=5, data=3, model=3): 175, ProcessCoord(pipe=5, data=4, model=0): 176, ProcessCoord(pipe=5, data=4, model=1): 177, ProcessCoord(pipe=5, data=4, model=2): 178, ProcessCoord(pipe=5, data=4, model=3): 179, ProcessCoord(pipe=5, data=5, model=0): 180, ProcessCoord(pipe=5, data=5, model=1): 181, ProcessCoord(pipe=5, data=5, model=2): 182, ProcessCoord(pipe=5, data=5, model=3): 183, ProcessCoord(pipe=5, data=6, model=0): 184, ProcessCoord(pipe=5, data=6, model=1): 185, ProcessCoord(pipe=5, data=6, model=2): 186, ProcessCoord(pipe=5, data=6, model=3): 187, ProcessCoord(pipe=5, data=7, model=0): 188, ProcessCoord(pipe=5, data=7, model=1): 189, ProcessCoord(pipe=5, data=7, model=2): 190, ProcessCoord(pipe=5, data=7, model=3): 191, ProcessCoord(pipe=6, data=0, model=0): 192, ProcessCoord(pipe=6, data=0, model=1): 193, ProcessCoord(pipe=6, data=0, model=2): 194, ProcessCoord(pipe=6, data=0, model=3): 195, ProcessCoord(pipe=6, data=1, model=0): 196, ProcessCoord(pipe=6, data=1, model=1): 197, ProcessCoord(pipe=6, data=1, model=2): 198, ProcessCoord(pipe=6, data=1, model=3): 199, ProcessCoord(pipe=6, data=2, model=0): 200, ProcessCoord(pipe=6, data=2, model=1): 201, ProcessCoord(pipe=6, data=2, model=2): 202, ProcessCoord(pipe=6, data=2, model=3): 203, ProcessCoord(pipe=6, data=3, model=0): 204, ProcessCoord(pipe=6, data=3, model=1): 205, ProcessCoord(pipe=6, data=3, model=2): 206, ProcessCoord(pipe=6, data=3, model=3): 207, ProcessCoord(pipe=6, data=4, model=0): 208, ProcessCoord(pipe=6, data=4, model=1): 209, ProcessCoord(pipe=6, data=4, model=2): 210, ProcessCoord(pipe=6, data=4, model=3): 211, ProcessCoord(pipe=6, data=5, model=0): 212, ProcessCoord(pipe=6, data=5, model=1): 213, ProcessCoord(pipe=6, data=5, model=2): 214, ProcessCoord(pipe=6, data=5, model=3): 215, ProcessCoord(pipe=6, data=6, model=0): 216, ProcessCoord(pipe=6, data=6, model=1): 217, ProcessCoord(pipe=6, data=6, model=2): 218, ProcessCoord(pipe=6, data=6, model=3): 219, ProcessCoord(pipe=6, data=7, model=0): 220, ProcessCoord(pipe=6, data=7, model=1): 221, ProcessCoord(pipe=6, data=7, model=2): 222, ProcessCoord(pipe=6, data=7, model=3): 223, ProcessCoord(pipe=7, data=0, model=0): 224, ProcessCoord(pipe=7, data=0, model=1): 225, ProcessCoord(pipe=7, data=0, model=2): 226, ProcessCoord(pipe=7, data=0, model=3): 227, ProcessCoord(pipe=7, data=1, model=0): 228, ProcessCoord(pipe=7, data=1, model=1): 229, ProcessCoord(pipe=7, data=1, model=2): 230, ProcessCoord(pipe=7, data=1, model=3): 231, ProcessCoord(pipe=7, data=2, model=0): 232, ProcessCoord(pipe=7, data=2, model=1): 233, ProcessCoord(pipe=7, data=2, model=2): 234, ProcessCoord(pipe=7, data=2, model=3): 235, ProcessCoord(pipe=7, data=3, model=0): 236, ProcessCoord(pipe=7, data=3, model=1): 237, ProcessCoord(pipe=7, data=3, model=2): 238, ProcessCoord(pipe=7, data=3, model=3): 239, ProcessCoord(pipe=7, data=4, model=0): 240, ProcessCoord(pipe=7, data=4, model=1): 241, ProcessCoord(pipe=7, data=4, model=2): 242, ProcessCoord(pipe=7, data=4, model=3): 243, ProcessCoord(pipe=7, data=5, model=0): 244, ProcessCoord(pipe=7, data=5, model=1): 245, ProcessCoord(pipe=7, data=5, model=2): 246, ProcessCoord(pipe=7, data=5, model=3): 247, ProcessCoord(pipe=7, data=6, model=0): 248, ProcessCoord(pipe=7, data=6, model=1): 249, ProcessCoord(pipe=7, data=6, model=2): 250, ProcessCoord(pipe=7, data=6, model=3): 251, ProcessCoord(pipe=7, data=7, model=0): 252, ProcessCoord(pipe=7, data=7, model=1): 253, ProcessCoord(pipe=7, data=7, model=2): 254, ProcessCoord(pipe=7, data=7, model=3): 255}
[2021-09-27 17:44:04,887] [INFO] [module.py:360:_partition_layers] Partitioning pipeline stages with method type:transformer
stage=0 layers=7
     0: _to_float16
     1: EmbeddingPipe
     2: <lambda>
     3: ParallelTransformerLayerPipe
     4: ParallelTransformerLayerPipe
     5: ParallelTransformerLayerPipe
     6: ParallelTransformerLayerPipe
stage=1 layers=4
     7: ParallelTransformerLayerPipe
     8: ParallelTransformerLayerPipe
     9: ParallelTransformerLayerPipe
    10: ParallelTransformerLayerPipe
stage=2 layers=4
    11: ParallelTransformerLayerPipe
    12: ParallelTransformerLayerPipe
    13: ParallelTransformerLayerPipe
    14: ParallelTransformerLayerPipe
stage=3 layers=4
    15: ParallelTransformerLayerPipe
    16: ParallelTransformerLayerPipe
    17: ParallelTransformerLayerPipe
    18: ParallelTransformerLayerPipe
stage=4 layers=4
    19: ParallelTransformerLayerPipe
    20: ParallelTransformerLayerPipe
    21: ParallelTransformerLayerPipe
    22: ParallelTransformerLayerPipe
stage=5 layers=4
    23: ParallelTransformerLayerPipe
    24: ParallelTransformerLayerPipe
    25: ParallelTransformerLayerPipe
    26: ParallelTransformerLayerPipe
stage=6 layers=4
    27: ParallelTransformerLayerPipe
    28: ParallelTransformerLayerPipe
    29: ParallelTransformerLayerPipe
    30: ParallelTransformerLayerPipe
stage=7 layers=8
    31: ParallelTransformerLayerPipe
    32: ParallelTransformerLayerPipe
    33: ParallelTransformerLayerPipe
    34: ParallelTransformerLayerPipe
    35: <lambda>
    36: MixedFusedLayerNorm
    37: EmbeddingPipe
    38: float16_to_fp32
  loss: CrossEntropy
 > number of parameters on (tensor, pipeline) model parallel rank (3, 6): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (2, 6): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (0, 6): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (0, 4): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (0, 5): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (3, 5): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (1, 5): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (2, 5): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (1, 2): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (0, 2): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (3, 2): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (2, 2): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (2, 3): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (1, 3): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (0, 3): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (3, 3): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (3, 4): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (2, 4): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (1, 4): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (2, 1): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (1, 6): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (3, 1): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (1, 1): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (0, 1): 1745293312
 > number of parameters on (tensor, pipeline) model parallel rank (2, 0): 1986465792
 > number of parameters on (tensor, pipeline) model parallel rank (2, 7): 1986498560
 > number of parameters on (tensor, pipeline) model parallel rank (3, 0): 1986465792
 > number of parameters on (tensor, pipeline) model parallel rank (1, 0): 1986465792
 > number of parameters on (tensor, pipeline) model parallel rank (3, 7): 1986498560
 > number of parameters on (tensor, pipeline) model parallel rank (1, 7): 1986498560
 > number of parameters on (tensor, pipeline) model parallel rank (0, 7): 1986498560
[2021-09-27 17:44:06,136] [INFO] [utils.py:680:see_memory_usage] After Building Model
[2021-09-27 17:44:06,136] [INFO] [utils.py:681:see_memory_usage] MA 3.77 GB         Max_MA 3.79 GB         CA 3.79 GB         Max_CA 4 GB 
[2021-09-27 17:44:06,137] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory:  used = 37.66 GB, percent = 20.1%
 > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 1986465792
setting training iterations to 159576
> learning rate decay style: cosine
DeepSpeed is enabled.
[2021-09-27 17:44:06,204] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.4.2+bc17042, git-hash=bc17042, git-branch=big-science
[2021-09-27 17:44:06,292] [INFO] [engine.py:179:__init__] DeepSpeed Flops Profiler Enabled: False
[2021-09-27 17:44:06,292] [INFO] [engine.py:736:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer
[2021-09-27 17:44:06,292] [INFO] [engine.py:741:_configure_optimizer] Using client Optimizer as basic optimizer
[2021-09-27 17:44:06,292] [INFO] [engine.py:750:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam
[2021-09-27 17:44:06,292] [INFO] [utils.py:43:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type=<class 'apex.optimizers.fused_adam.FusedAdam'>
[2021-09-27 17:44:06,292] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer
[2021-09-27 17:44:06,292] [INFO] [stage2.py:106:__init__] Reduce bucket size 500000000
[2021-09-27 17:44:06,292] [INFO] [stage2.py:107:__init__] Allgather bucket size 500000000
[2021-09-27 17:44:06,292] [INFO] [stage2.py:108:__init__] CPU Offload: False
[2021-09-27 17:44:06,292] [INFO] [stage2.py:109:__init__] Round robin gradient partitioning: False
[2021-09-27 17:44:11,004] [INFO] [stage2.py:419:__init__] optimizer state initialized
[2021-09-27 17:44:11,004] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam
[2021-09-27 17:44:11,004] [INFO] [engine.py:553:_configure_lr_scheduler] DeepSpeed using client LR scheduler
[2021-09-27 17:44:11,005] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = <megatron.learning_rates.AnnealingLR object at 0x14b53f0b7c70>
[2021-09-27 17:44:11,005] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)]
[2021-09-27 17:44:11,005] [INFO] [config.py:900:print] DeepSpeedEngine configuration:
[2021-09-27 17:44:11,005] [INFO] [config.py:904:print]   activation_checkpointing_config  {
    "partition_activations": false, 
    "contiguous_memory_optimization": false, 
    "cpu_checkpointing": false, 
    "number_checkpoints": null, 
    "synchronize_checkpoint_boundary": false, 
    "profile": false
}
[2021-09-27 17:44:11,005] [INFO] [config.py:904:print]   aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}
[2021-09-27 17:44:11,005] [INFO] [config.py:904:print]   allreduce_always_fp32 ........ False
[2021-09-27 17:44:11,005] [INFO] [config.py:904:print]   amp_enabled .................. False
[2021-09-27 17:44:11,005] [INFO] [config.py:904:print]   amp_params ................... False
[2021-09-27 17:44:11,005] [INFO] [config.py:904:print]   checkpoint_tag_validation_enabled  True
[2021-09-27 17:44:11,005] [INFO] [config.py:904:print]   checkpoint_tag_validation_fail  False
[2021-09-27 17:44:11,005] [INFO] [config.py:904:print]   disable_allgather ............ False
[2021-09-27 17:44:11,005] [INFO] [config.py:904:print]   dump_state ................... False
[2021-09-27 17:44:11,005] [INFO] [config.py:904:print]   dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1}
[2021-09-27 17:44:11,005] [INFO] [config.py:904:print]   eigenvalue_enabled ........... False
[2021-09-27 17:44:11,005] [INFO] [config.py:904:print]   eigenvalue_gas_boundary_resolution  1
[2021-09-27 17:44:11,005] [INFO] [config.py:904:print]   eigenvalue_layer_name ........ bert.encoder.layer
[2021-09-27 17:44:11,005] [INFO] [config.py:904:print]   eigenvalue_layer_num ......... 0
[2021-09-27 17:44:11,005] [INFO] [config.py:904:print]   eigenvalue_max_iter .......... 100
[2021-09-27 17:44:11,006] [INFO] [config.py:904:print]   eigenvalue_stability ......... 1e-06
[2021-09-27 17:44:11,006] [INFO] [config.py:904:print]   eigenvalue_tol ............... 0.01
[2021-09-27 17:44:11,006] [INFO] [config.py:904:print]   eigenvalue_verbose ........... False
[2021-09-27 17:44:11,006] [INFO] [config.py:904:print]   elasticity_enabled ........... False
[2021-09-27 17:44:11,006] [INFO] [config.py:904:print]   flops_profiler_config ........ {
    "enabled": false, 
    "profile_step": 1, 
    "module_depth": -1, 
    "top_modules": 1, 
    "detailed": true, 
    "output_file": null
}
[2021-09-27 17:44:11,006] [INFO] [config.py:904:print]   fp16_enabled ................. True
[2021-09-27 17:44:11,006] [INFO] [config.py:904:print]   fp16_mixed_quantize .......... False
[2021-09-27 17:44:11,006] [INFO] [config.py:904:print]   global_rank .................. 0
[2021-09-27 17:44:11,006] [INFO] [config.py:904:print]   gradient_accumulation_steps .. 256
[2021-09-27 17:44:11,006] [INFO] [config.py:904:print]   gradient_clipping ............ 1.0
[2021-09-27 17:44:11,006] [INFO] [config.py:904:print]   gradient_predivide_factor .... 1.0
[2021-09-27 17:44:11,006] [INFO] [config.py:904:print]   initial_dynamic_scale ........ 4096
[2021-09-27 17:44:11,006] [INFO] [config.py:904:print]   loss_scale ................... 0
[2021-09-27 17:44:11,006] [INFO] [config.py:904:print]   memory_breakdown ............. False
[2021-09-27 17:44:11,006] [INFO] [config.py:904:print]   optimizer_legacy_fusion ...... False
[2021-09-27 17:44:11,006] [INFO] [config.py:904:print]   optimizer_name ............... None
[2021-09-27 17:44:11,006] [INFO] [config.py:904:print]   optimizer_params ............. None
[2021-09-27 17:44:11,006] [INFO] [config.py:904:print]   pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0}
[2021-09-27 17:44:11,006] [INFO] [config.py:904:print]   pld_enabled .................. False
[2021-09-27 17:44:11,006] [INFO] [config.py:904:print]   pld_params ................... False
[2021-09-27 17:44:11,006] [INFO] [config.py:904:print]   prescale_gradients ........... False
[2021-09-27 17:44:11,006] [INFO] [config.py:904:print]   quantize_change_rate ......... 0.001
[2021-09-27 17:44:11,006] [INFO] [config.py:904:print]   quantize_groups .............. 1
[2021-09-27 17:44:11,006] [INFO] [config.py:904:print]   quantize_offset .............. 1000
[2021-09-27 17:44:11,006] [INFO] [config.py:904:print]   quantize_period .............. 1000
[2021-09-27 17:44:11,006] [INFO] [config.py:904:print]   quantize_rounding ............ 0
[2021-09-27 17:44:11,006] [INFO] [config.py:904:print]   quantize_start_bits .......... 16
[2021-09-27 17:44:11,006] [INFO] [config.py:904:print]   quantize_target_bits ......... 8
[2021-09-27 17:44:11,006] [INFO] [config.py:904:print]   quantize_training_enabled .... False
[2021-09-27 17:44:11,006] [INFO] [config.py:904:print]   quantize_type ................ 0
[2021-09-27 17:44:11,006] [INFO] [config.py:904:print]   quantize_verbose ............. False
[2021-09-27 17:44:11,006] [INFO] [config.py:904:print]   scheduler_name ............... None
[2021-09-27 17:44:11,006] [INFO] [config.py:904:print]   scheduler_params ............. None
[2021-09-27 17:44:11,006] [INFO] [config.py:904:print]   sparse_attention ............. None
[2021-09-27 17:44:11,007] [INFO] [config.py:904:print]   sparse_gradients_enabled ..... False
[2021-09-27 17:44:11,007] [INFO] [config.py:904:print]   steps_per_print .............. 2000
[2021-09-27 17:44:11,007] [INFO] [config.py:904:print]   tensorboard_enabled .......... False
[2021-09-27 17:44:11,007] [INFO] [config.py:904:print]   tensorboard_job_name ......... DeepSpeedJobName
[2021-09-27 17:44:11,007] [INFO] [config.py:904:print]   tensorboard_output_path ...... 
[2021-09-27 17:44:11,007] [INFO] [config.py:904:print]   train_batch_size ............. 2048
[2021-09-27 17:44:11,007] [INFO] [config.py:904:print]   train_micro_batch_size_per_gpu  1
[2021-09-27 17:44:11,007] [INFO] [config.py:904:print]   use_quantizer_kernel ......... False
[2021-09-27 17:44:11,007] [INFO] [config.py:904:print]   wall_clock_breakdown ......... False
[2021-09-27 17:44:11,007] [INFO] [config.py:904:print]   world_size ................... 8
[2021-09-27 17:44:11,007] [INFO] [config.py:904:print]   zero_allow_untested_optimizer  False
[2021-09-27 17:44:11,007] [INFO] [config.py:904:print]   zero_config .................. {
    "stage": 1, 
    "contiguous_gradients": false, 
    "reduce_scatter": true, 
    "reduce_bucket_size": 5.000000e+08, 
    "allgather_partitions": true, 
    "allgather_bucket_size": 5.000000e+08, 
    "overlap_comm": false, 
    "load_from_fp32_weights": true, 
    "elastic_checkpoint": true, 
    "offload_param": null, 
    "offload_optimizer": null, 
    "sub_group_size": 1.000000e+09, 
    "prefetch_bucket_size": 5.000000e+07, 
    "param_persistence_threshold": 1.000000e+05, 
    "max_live_parameters": 1.000000e+09, 
    "max_reuse_distance": 1.000000e+09, 
    "gather_fp16_weights_on_model_save": false, 
    "ignore_unused_parameters": true, 
    "round_robin_gradients": false, 
    "legacy_stage1": false
}
[2021-09-27 17:44:11,007] [INFO] [config.py:904:print]   zero_enabled ................. True
[2021-09-27 17:44:11,007] [INFO] [config.py:904:print]   zero_optimization_stage ...... 1
[2021-09-27 17:44:11,007] [INFO] [config.py:906:print]   json = {
    "train_micro_batch_size_per_gpu": 1, 
    "train_batch_size": 2.048000e+03, 
    "gradient_clipping": 1.0, 
    "zero_optimization": {
        "stage": 1
    }, 
    "fp16": {
        "enabled": true, 
        "loss_scale": 0, 
        "loss_scale_window": 500, 
        "hysteresis": 2, 
        "min_loss_scale": 1, 
        "initial_scale_power": 12
    }, 
    "steps_per_print": 2.000000e+03, 
    "wall_clock_breakdown": false
}
[2021-09-27 17:44:11,007] [INFO] [engine.py:76:__init__] CONFIG: micro_batches=256 micro_batch_size=1
[2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=0 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=1 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=3 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=2 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=131 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=129 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=128 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=130 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=224 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=225 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=227 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=226 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=65 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=66 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=64 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=67 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=193 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=194 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=195 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=192 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=32 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=35 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=34 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=33 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=99 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=98 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=97 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=161 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=160 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=162 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=163 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
[2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=96 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
 > using checkpoint value 6e-05 for learning rate
 > using checkpoint value 6e-06 for minimum learning rate
 > using checkpoint value 216320 for warmup iterations
 > using checkpoint value 126953125 for total number of iterations
 > using checkpoint value cosine for decay style
successfully loaded 8 ZeRO state_dicts for rank 196
successfully loaded 8 ZeRO state_dicts for rank 207
successfully loaded 8 ZeRO state_dicts for rank 96
successfully loaded 8 ZeRO state_dicts for rank 192
successfully loaded 8 ZeRO state_dicts for rank 195
successfully loaded 8 ZeRO state_dicts for rank 212
successfully loaded 8 ZeRO state_dicts for rank 154
successfully loaded 8 ZeRO state_dicts for rank 148
successfully loaded 8 ZeRO state_dicts for rank 198
successfully loaded 8 ZeRO state_dicts for rank 112
successfully loaded 8 ZeRO state_dicts for rank 104
successfully loaded 8 ZeRO state_dicts for rank 42
successfully loaded 8 ZeRO state_dicts for rank 199
successfully loaded 8 ZeRO state_dicts for rank 120
successfully loaded 8 ZeRO state_dicts for rank 205
successfully loaded 8 ZeRO state_dicts for rank 193
successfully loaded 8 ZeRO state_dicts for rank 116
successfully loaded 8 ZeRO state_dicts for rank 158
successfully loaded 8 ZeRO state_dicts for rank 150
successfully loaded 8 ZeRO state_dicts for rank 100
successfully loaded 8 ZeRO state_dicts for rank 166
successfully loaded 8 ZeRO state_dicts for rank 62
successfully loaded 8 ZeRO state_dicts for rank 134
successfully loaded 8 ZeRO state_dicts for rank 204
successfully loaded 8 ZeRO state_dicts for rank 67
successfully loaded 8 ZeRO state_dicts for rank 136
successfully loaded 8 ZeRO state_dicts for rank 145
successfully loaded 8 ZeRO state_dicts for rank 182
successfully loaded 8 ZeRO state_dicts for rank 133
successfully loaded 8 ZeRO state_dicts for rank 65
successfully loaded 8 ZeRO state_dicts for rank 208
successfully loaded 8 ZeRO state_dicts for rank 124
successfully loaded 8 ZeRO state_dicts for rank 130
successfully loaded 8 ZeRO state_dicts for rank 141
successfully loaded 8 ZeRO state_dicts for rank 132
successfully loaded 8 ZeRO state_dicts for rank 206
successfully loaded 8 ZeRO state_dicts for rank 157
successfully loaded 8 ZeRO state_dicts for rank 115
successfully loaded 8 ZeRO state_dicts for rank 200
successfully loaded 8 ZeRO state_dicts for rank 152
successfully loaded 8 ZeRO state_dicts for rank 89
successfully loaded 8 ZeRO state_dicts for rank 40
successfully loaded 8 ZeRO state_dicts for rank 140
successfully loaded 8 ZeRO state_dicts for rank 75
loading 8 zero partition checkpoints for rank 196
successfully loaded 8 ZeRO state_dicts for rank 122
successfully loaded 8 ZeRO state_dicts for rank 153
successfully loaded 8 ZeRO state_dicts for rank 32
successfully loaded 8 ZeRO state_dicts for rank 216
successfully loaded 8 ZeRO state_dicts for rank 57
successfully loaded 8 ZeRO state_dicts for rank 165
successfully loaded 8 ZeRO state_dicts for rank 60
successfully loaded 8 ZeRO state_dicts for rank 63
successfully loaded 8 ZeRO state_dicts for rank 83
successfully loaded 8 ZeRO state_dicts for rank 99
successfully loaded 8 ZeRO state_dicts for rank 108
successfully loaded 8 ZeRO state_dicts for rank 77
successfully loaded 8 ZeRO state_dicts for rank 190
successfully loaded 8 ZeRO state_dicts for rank 146
successfully loaded 8 ZeRO state_dicts for rank 36
successfully loaded 8 ZeRO state_dicts for rank 114
successfully loaded 8 ZeRO state_dicts for rank 129
successfully loaded 8 ZeRO state_dicts for rank 54
successfully loaded 8 ZeRO state_dicts for rank 98
successfully loaded 8 ZeRO state_dicts for rank 220
successfully loaded 8 ZeRO state_dicts for rank 93
successfully loaded 8 ZeRO state_dicts for rank 144
successfully loaded 8 ZeRO state_dicts for rank 64
successfully loaded 8 ZeRO state_dicts for rank 76
successfully loaded 8 ZeRO state_dicts for rank 58
successfully loaded 8 ZeRO state_dicts for rank 72
successfully loaded 8 ZeRO state_dicts for rank 155
successfully loaded 8 ZeRO state_dicts for rank 103
successfully loaded 8 ZeRO state_dicts for rank 80
successfully loaded 8 ZeRO state_dicts for rank 34
successfully loaded 8 ZeRO state_dicts for rank 149
successfully loaded 8 ZeRO state_dicts for rank 87
successfully loaded 8 ZeRO state_dicts for rank 52
successfully loaded 8 ZeRO state_dicts for rank 41
successfully loaded 8 ZeRO state_dicts for rank 84
successfully loaded 8 ZeRO state_dicts for rank 37
successfully loaded 8 ZeRO state_dicts for rank 38
successfully loaded 8 ZeRO state_dicts for rank 107
successfully loaded 8 ZeRO state_dicts for rank 48
successfully loaded 8 ZeRO state_dicts for rank 44
successfully loaded 8 ZeRO state_dicts for rank 53
loading 8 zero partition checkpoints for rank 96
successfully loaded 8 ZeRO state_dicts for rank 79
successfully loaded 8 ZeRO state_dicts for rank 164
successfully loaded 8 ZeRO state_dicts for rank 46
successfully loaded 8 ZeRO state_dicts for rank 73
successfully loaded 8 ZeRO state_dicts for rank 91
successfully loaded 8 ZeRO state_dicts for rank 106
successfully loaded 8 ZeRO state_dicts for rank 71
loading 8 zero partition checkpoints for rank 207
successfully loaded 8 ZeRO state_dicts for rank 138
successfully loaded 8 ZeRO state_dicts for rank 33
successfully loaded 8 ZeRO state_dicts for rank 156
successfully loaded 8 ZeRO state_dicts for rank 201
successfully loaded 8 ZeRO state_dicts for rank 61
successfully loaded 8 ZeRO state_dicts for rank 178
successfully loaded 8 ZeRO state_dicts for rank 39
successfully loaded 8 ZeRO state_dicts for rank 111
successfully loaded 8 ZeRO state_dicts for rank 215
successfully loaded 8 ZeRO state_dicts for rank 191
successfully loaded 8 ZeRO state_dicts for rank 147
successfully loaded 8 ZeRO state_dicts for rank 167
successfully loaded 8 ZeRO state_dicts for rank 170
successfully loaded 8 ZeRO state_dicts for rank 95
successfully loaded 8 ZeRO state_dicts for rank 142
successfully loaded 8 ZeRO state_dicts for rank 151
successfully loaded 8 ZeRO state_dicts for rank 135
successfully loaded 8 ZeRO state_dicts for rank 118
successfully loaded 8 ZeRO state_dicts for rank 97
successfully loaded 8 ZeRO state_dicts for rank 159
successfully loaded 8 ZeRO state_dicts for rank 174
successfully loaded 8 ZeRO state_dicts for rank 219
successfully loaded 8 ZeRO state_dicts for rank 211
successfully loaded 8 ZeRO state_dicts for rank 180
successfully loaded 8 ZeRO state_dicts for rank 143
successfully loaded 8 ZeRO state_dicts for rank 43
successfully loaded 8 ZeRO state_dicts for rank 171
successfully loaded 8 ZeRO state_dicts for rank 55
successfully loaded 8 ZeRO state_dicts for rank 59
successfully loaded 8 ZeRO state_dicts for rank 203
successfully loaded 8 ZeRO state_dicts for rank 45
successfully loaded 8 ZeRO state_dicts for rank 210
successfully loaded 8 ZeRO state_dicts for rank 50
successfully loaded 8 ZeRO state_dicts for rank 113
successfully loaded 8 ZeRO state_dicts for rank 68
successfully loaded 8 ZeRO state_dicts for rank 128
successfully loaded 8 ZeRO state_dicts for rank 187
successfully loaded 8 ZeRO state_dicts for rank 186
successfully loaded 8 ZeRO state_dicts for rank 102
successfully loaded 8 ZeRO state_dicts for rank 109
successfully loaded 8 ZeRO state_dicts for rank 56
successfully loaded 8 ZeRO state_dicts for rank 137
successfully loaded 8 ZeRO state_dicts for rank 81
successfully loaded 8 ZeRO state_dicts for rank 169
successfully loaded 8 ZeRO state_dicts for rank 202
successfully loaded 8 ZeRO state_dicts for rank 10
successfully loaded 8 ZeRO state_dicts for rank 110
loading 8 zero partition checkpoints for rank 195
successfully loaded 8 ZeRO state_dicts for rank 197
successfully loaded 8 ZeRO state_dicts for rank 119
successfully loaded 8 ZeRO state_dicts for rank 105
successfully loaded 8 ZeRO state_dicts for rank 88
successfully loaded 8 ZeRO state_dicts for rank 92
successfully loaded 8 ZeRO state_dicts for rank 214
successfully loaded 8 ZeRO state_dicts for rank 223
successfully loaded 8 ZeRO state_dicts for rank 126
successfully loaded 8 ZeRO state_dicts for rank 162
loading 8 zero partition checkpoints for rank 192
successfully loaded 8 ZeRO state_dicts for rank 173
successfully loaded 8 ZeRO state_dicts for rank 125
successfully loaded 8 ZeRO state_dicts for rank 90
successfully loaded 8 ZeRO state_dicts for rank 121
successfully loaded 8 ZeRO state_dicts for rank 123
successfully loaded 8 ZeRO state_dicts for rank 163
successfully loaded 8 ZeRO state_dicts for rank 127
successfully loaded 8 ZeRO state_dicts for rank 51
successfully loaded 8 ZeRO state_dicts for rank 78
successfully loaded 8 ZeRO state_dicts for rank 213
successfully loaded 8 ZeRO state_dicts for rank 181
successfully loaded 8 ZeRO state_dicts for rank 194
successfully loaded 8 ZeRO state_dicts for rank 218
successfully loaded 8 ZeRO state_dicts for rank 35
successfully loaded 8 ZeRO state_dicts for rank 22
successfully loaded 8 ZeRO state_dicts for rank 188
successfully loaded 8 ZeRO state_dicts for rank 139
successfully loaded 8 ZeRO state_dicts for rank 47
successfully loaded 8 ZeRO state_dicts for rank 175
successfully loaded 8 ZeRO state_dicts for rank 168
successfully loaded 8 ZeRO state_dicts for rank 184
successfully loaded 8 ZeRO state_dicts for rank 69
successfully loaded 8 ZeRO state_dicts for rank 85
loading 8 zero partition checkpoints for rank 154
successfully loaded 8 ZeRO state_dicts for rank 66
successfully loaded 8 ZeRO state_dicts for rank 117
successfully loaded 8 ZeRO state_dicts for rank 161
successfully loaded 8 ZeRO state_dicts for rank 49
successfully loaded 8 ZeRO state_dicts for rank 86
successfully loaded 8 ZeRO state_dicts for rank 101
successfully loaded 8 ZeRO state_dicts for rank 222
successfully loaded 8 ZeRO state_dicts for rank 70
successfully loaded 8 ZeRO state_dicts for rank 30
successfully loaded 8 ZeRO state_dicts for rank 131
successfully loaded 8 ZeRO state_dicts for rank 183
loading 8 zero partition checkpoints for rank 112
successfully loaded 8 ZeRO state_dicts for rank 94
successfully loaded 8 ZeRO state_dicts for rank 217
successfully loaded 8 ZeRO state_dicts for rank 82
successfully loaded 8 ZeRO state_dicts for rank 8
successfully loaded 8 ZeRO state_dicts for rank 160
successfully loaded 8 ZeRO state_dicts for rank 252
loading 8 zero partition checkpoints for rank 205
successfully loaded 8 ZeRO state_dicts for rank 172
successfully loaded 8 ZeRO state_dicts for rank 14
loading 8 zero partition checkpoints for rank 42
loading 8 zero partition checkpoints for rank 104
loading 8 zero partition checkpoints for rank 193
successfully loaded 8 ZeRO state_dicts for rank 189
successfully loaded 8 ZeRO state_dicts for rank 232
successfully loaded 8 ZeRO state_dicts for rank 177
loading 8 zero partition checkpoints for rank 120
successfully loaded 8 ZeRO state_dicts for rank 228
successfully loaded 8 ZeRO state_dicts for rank 185
successfully loaded 8 ZeRO state_dicts for rank 209
successfully loaded 8 ZeRO state_dicts for rank 235
successfully loaded 8 ZeRO state_dicts for rank 244
successfully loaded 8 ZeRO state_dicts for rank 236
successfully loaded 8 ZeRO state_dicts for rank 31
loading 8 zero partition checkpoints for rank 116
successfully loaded 8 ZeRO state_dicts for rank 224
loading 8 zero partition checkpoints for rank 62
successfully loaded 8 ZeRO state_dicts for rank 74
loading 8 zero partition checkpoints for rank 166
loading 8 zero partition checkpoints for rank 134
successfully loaded 8 ZeRO state_dicts for rank 26
successfully loaded 8 ZeRO state_dicts for rank 176
loading 8 zero partition checkpoints for rank 204
successfully loaded 8 ZeRO state_dicts for rank 251
successfully loaded 8 ZeRO state_dicts for rank 15
successfully loaded 8 ZeRO state_dicts for rank 4
loading 8 zero partition checkpoints for rank 199
loading 8 zero partition checkpoints for rank 133
loading 8 zero partition checkpoints for rank 198
loading 8 zero partition checkpoints for rank 67
successfully loaded 8 ZeRO state_dicts for rank 18
successfully loaded 8 ZeRO state_dicts for rank 179
loading 8 zero partition checkpoints for rank 124
successfully loaded 8 ZeRO state_dicts for rank 247
successfully loaded 8 ZeRO state_dicts for rank 11
successfully loaded 8 ZeRO state_dicts for rank 28
loading 8 zero partition checkpoints for rank 148
successfully loaded 8 ZeRO state_dicts for rank 229
loading 8 zero partition checkpoints for rank 65
successfully loaded 8 ZeRO state_dicts for rank 7
successfully loaded 8 ZeRO state_dicts for rank 248
successfully loaded 8 ZeRO state_dicts for rank 221
loading 8 zero partition checkpoints for rank 182
loading 8 zero partition checkpoints for rank 130
successfully loaded 8 ZeRO state_dicts for rank 238
successfully loaded 8 ZeRO state_dicts for rank 12
loading 8 zero partition checkpoints for rank 145
successfully loaded 8 ZeRO state_dicts for rank 234
successfully loaded 8 ZeRO state_dicts for rank 6
loading 8 zero partition checkpoints for rank 206
successfully loaded 8 ZeRO state_dicts for rank 27
successfully loaded 8 ZeRO state_dicts for rank 250
loading 8 zero partition checkpoints for rank 157
successfully loaded 8 ZeRO state_dicts for rank 225
successfully loaded 8 ZeRO state_dicts for rank 23
loading 8 zero partition checkpoints for rank 40
successfully loaded 8 ZeRO state_dicts for rank 19
successfully loaded 8 ZeRO state_dicts for rank 3
loading 8 zero partition checkpoints for rank 89
loading 8 zero partition checkpoints for rank 141
loading 8 zero partition checkpoints for rank 122
loading 8 zero partition checkpoints for rank 75
successfully loaded 8 ZeRO state_dicts for rank 239
successfully loaded 8 ZeRO state_dicts for rank 241
successfully loaded 8 ZeRO state_dicts for rank 245
successfully loaded 8 ZeRO state_dicts for rank 243
successfully loaded 8 ZeRO state_dicts for rank 0
successfully loaded 8 ZeRO state_dicts for rank 20
successfully loaded 8 ZeRO state_dicts for rank 24
loading 8 zero partition checkpoints for rank 140
successfully loaded 8 ZeRO state_dicts for rank 231
successfully loaded 8 ZeRO state_dicts for rank 29
loading 8 zero partition checkpoints for rank 32
successfully loaded 8 ZeRO state_dicts for rank 240
successfully loaded 8 ZeRO state_dicts for rank 2
successfully loaded 8 ZeRO state_dicts for rank 16
loading 8 zero partition checkpoints for rank 132
successfully loaded 8 ZeRO state_dicts for rank 233
successfully loaded 8 ZeRO state_dicts for rank 253
successfully loaded 8 ZeRO state_dicts for rank 255
successfully loaded 8 ZeRO state_dicts for rank 242
successfully loaded 8 ZeRO state_dicts for rank 237
loading 8 zero partition checkpoints for rank 83
successfully loaded 8 ZeRO state_dicts for rank 254
loading 8 zero partition checkpoints for rank 165
loading 8 zero partition checkpoints for rank 158
successfully loaded 8 ZeRO state_dicts for rank 246
loading 8 zero partition checkpoints for rank 77
loading 8 zero partition checkpoints for rank 99
loading 8 zero partition checkpoints for rank 152
loading 8 zero partition checkpoints for rank 216
loading 8 zero partition checkpoints for rank 36
loading 8 zero partition checkpoints for rank 115
loading 8 zero partition checkpoints for rank 54
loading 8 zero partition checkpoints for rank 190
loading 8 zero partition checkpoints for rank 146
loading 8 zero partition checkpoints for rank 98
loading 8 zero partition checkpoints for rank 100
loading 8 zero partition checkpoints for rank 150
successfully loaded 8 ZeRO state_dicts for rank 13
successfully loaded 8 ZeRO state_dicts for rank 226
successfully loaded 8 ZeRO state_dicts for rank 9
loading 8 zero partition checkpoints for rank 153
loading 8 zero partition checkpoints for rank 64
successfully loaded 8 ZeRO state_dicts for rank 5
successfully loaded 8 ZeRO state_dicts for rank 249
loading 8 zero partition checkpoints for rank 155
loading 8 zero partition checkpoints for rank 72
successfully loaded 8 ZeRO state_dicts for rank 17
successfully loaded 8 ZeRO state_dicts for rank 230
loading 8 zero partition checkpoints for rank 80
loading 8 zero partition checkpoints for rank 149
loading 8 zero partition checkpoints for rank 76
successfully loaded 8 ZeRO state_dicts for rank 1
successfully loaded 8 ZeRO state_dicts for rank 227
loading 8 zero partition checkpoints for rank 144
successfully loaded 8 ZeRO state_dicts for rank 21
loading 8 zero partition checkpoints for rank 41
loading 8 zero partition checkpoints for rank 107
loading 8 zero partition checkpoints for rank 34
loading 8 zero partition checkpoints for rank 87
loading 8 zero partition checkpoints for rank 212
loading 8 zero partition checkpoints for rank 220
loading 8 zero partition checkpoints for rank 44
loading 8 zero partition checkpoints for rank 73
loading 8 zero partition checkpoints for rank 33
loading 8 zero partition checkpoints for rank 164
loading 8 zero partition checkpoints for rank 111
loading 8 zero partition checkpoints for rank 106
loading 8 zero partition checkpoints for rank 167
loading 8 zero partition checkpoints for rank 39
loading 8 zero partition checkpoints for rank 46
loading 8 zero partition checkpoints for rank 201
loading 8 zero partition checkpoints for rank 151
loading 8 zero partition checkpoints for rank 118
loading 8 zero partition checkpoints for rank 71
loading 8 zero partition checkpoints for rank 59
loading 8 zero partition checkpoints for rank 114
loading 8 zero partition checkpoints for rank 159
loading 8 zero partition checkpoints for rank 57
loading 8 zero partition checkpoints for rank 43
loading 8 zero partition checkpoints for rank 97
loading 8 zero partition checkpoints for rank 219
loading 8 zero partition checkpoints for rank 113
loading 8 zero partition checkpoints for rank 55
loading 8 zero partition checkpoints for rank 61
loading 8 zero partition checkpoints for rank 203
loading 8 zero partition checkpoints for rank 211
loading 8 zero partition checkpoints for rank 50
loading 8 zero partition checkpoints for rank 48
loading 8 zero partition checkpoints for rank 200
loading 8 zero partition checkpoints for rank 191
loading 8 zero partition checkpoints for rank 169
loading 8 zero partition checkpoints for rank 102
loading 8 zero partition checkpoints for rank 81
loading 8 zero partition checkpoints for rank 56
loading 8 zero partition checkpoints for rank 147
loading 8 zero partition checkpoints for rank 84
loading 8 zero partition checkpoints for rank 136
loading 8 zero partition checkpoints for rank 210
loading 8 zero partition checkpoints for rank 178
loading 8 zero partition checkpoints for rank 105
loading 8 zero partition checkpoints for rank 223
loading 8 zero partition checkpoints for rank 197
loading 8 zero partition checkpoints for rank 170
loading 8 zero partition checkpoints for rank 135
loading 8 zero partition checkpoints for rank 45
loading 8 zero partition checkpoints for rank 119
loading 8 zero partition checkpoints for rank 180
loading 8 zero partition checkpoints for rank 173
loading 8 zero partition checkpoints for rank 123
loading 8 zero partition checkpoints for rank 125
loading 8 zero partition checkpoints for rank 171
loading 8 zero partition checkpoints for rank 186
loading 8 zero partition checkpoints for rank 109
loading 8 zero partition checkpoints for rank 52
loading 8 zero partition checkpoints for rank 121
loading 8 zero partition checkpoints for rank 58
loading 8 zero partition checkpoints for rank 53
loading 8 zero partition checkpoints for rank 218
loading 8 zero partition checkpoints for rank 168
loading 8 zero partition checkpoints for rank 181
loading 8 zero partition checkpoints for rank 188
loading 8 zero partition checkpoints for rank 194
loading 8 zero partition checkpoints for rank 92
loading 8 zero partition checkpoints for rank 184
successfully loaded 8 ZeRO state_dicts for rank 25
loading 8 zero partition checkpoints for rank 156
loading 8 zero partition checkpoints for rank 161
loading 8 zero partition checkpoints for rank 131
loading 8 zero partition checkpoints for rank 63
loading 8 zero partition checkpoints for rank 35
loading 8 zero partition checkpoints for rank 66
loading 8 zero partition checkpoints for rank 90
loading 8 zero partition checkpoints for rank 163
loading 8 zero partition checkpoints for rank 93
loading 8 zero partition checkpoints for rank 86
loading 8 zero partition checkpoints for rank 183
loading 8 zero partition checkpoints for rank 117
loading 8 zero partition checkpoints for rank 103
loading 8 zero partition checkpoints for rank 47
loading 8 zero partition checkpoints for rank 10
loading 8 zero partition checkpoints for rank 82
loading 8 zero partition checkpoints for rank 69
loading 8 zero partition checkpoints for rank 60
loading 8 zero partition checkpoints for rank 101
loading 8 zero partition checkpoints for rank 94
loading 8 zero partition checkpoints for rank 22
loading 8 zero partition checkpoints for rank 108
loading 8 zero partition checkpoints for rank 177
loading 8 zero partition checkpoints for rank 37
loading 8 zero partition checkpoints for rank 38
loading 8 zero partition checkpoints for rank 79
loading 8 zero partition checkpoints for rank 217
loading 8 zero partition checkpoints for rank 138
loading 8 zero partition checkpoints for rank 189
loading 8 zero partition checkpoints for rank 208
loading 8 zero partition checkpoints for rank 143
loading 8 zero partition checkpoints for rank 142
loading 8 zero partition checkpoints for rank 172
loading 8 zero partition checkpoints for rank 85
loading 8 zero partition checkpoints for rank 74
loading 8 zero partition checkpoints for rank 68
loading 8 zero partition checkpoints for rank 14
loading 8 zero partition checkpoints for rank 252
loading 8 zero partition checkpoints for rank 202
loading 8 zero partition checkpoints for rank 95
loading 8 zero partition checkpoints for rank 126
loading 8 zero partition checkpoints for rank 129
loading 8 zero partition checkpoints for rank 232
loading 8 zero partition checkpoints for rank 137
loading 8 zero partition checkpoints for rank 214
loading 8 zero partition checkpoints for rank 78
loading 8 zero partition checkpoints for rank 162
loading 8 zero partition checkpoints for rank 4
loading 8 zero partition checkpoints for rank 127
loading 8 zero partition checkpoints for rank 139
loading 8 zero partition checkpoints for rank 110
loading 8 zero partition checkpoints for rank 247
loading 8 zero partition checkpoints for rank 222
loading 8 zero partition checkpoints for rank 229
loading 8 zero partition checkpoints for rank 128
loading 8 zero partition checkpoints for rank 51
loading 8 zero partition checkpoints for rank 174
loading 8 zero partition checkpoints for rank 187
loading 8 zero partition checkpoints for rank 70
loading 8 zero partition checkpoints for rank 215
loading 8 zero partition checkpoints for rank 160
loading 8 zero partition checkpoints for rank 91
loading 8 zero partition checkpoints for rank 49
loading 8 zero partition checkpoints for rank 6
loading 8 zero partition checkpoints for rank 24
loading 8 zero partition checkpoints for rank 243
loading 8 zero partition checkpoints for rank 221
loading 8 zero partition checkpoints for rank 8
loading 8 zero partition checkpoints for rank 20
loading 8 zero partition checkpoints for rank 240
loading 8 zero partition checkpoints for rank 236
loading 8 zero partition checkpoints for rank 2
loading 8 zero partition checkpoints for rank 27
loading 8 zero partition checkpoints for rank 213
loading 8 zero partition checkpoints for rank 176
loading 8 zero partition checkpoints for rank 175
loading 8 zero partition checkpoints for rank 253
loading 8 zero partition checkpoints for rank 209
loading 8 zero partition checkpoints for rank 231
loading 8 zero partition checkpoints for rank 239
loading 8 zero partition checkpoints for rank 88
loading 8 zero partition checkpoints for rank 28
loading 8 zero partition checkpoints for rank 179
loading 8 zero partition checkpoints for rank 185
loading 8 zero partition checkpoints for rank 13
loading 8 zero partition checkpoints for rank 233
loading 8 zero partition checkpoints for rank 11
loading 8 zero partition checkpoints for rank 246
loading 8 zero partition checkpoints for rank 9
loading 8 zero partition checkpoints for rank 224
loading 8 zero partition checkpoints for rank 248
loading 8 zero partition checkpoints for rank 251
loading 8 zero partition checkpoints for rank 1
loading 8 zero partition checkpoints for rank 29
loading 8 zero partition checkpoints for rank 235
loading 8 zero partition checkpoints for rank 250
loading 8 zero partition checkpoints for rank 23
loading 8 zero partition checkpoints for rank 244
loading 8 zero partition checkpoints for rank 241
loading 8 zero partition checkpoints for rank 225
loading 8 zero partition checkpoints for rank 18
loading 8 zero partition checkpoints for rank 234
loading 8 zero partition checkpoints for rank 3
loading 8 zero partition checkpoints for rank 242
loading 8 zero partition checkpoints for rank 0
 checkpoint version 3.0
loading 8 zero partition checkpoints for rank 21
loading 8 zero partition checkpoints for rank 249
loading 8 zero partition checkpoints for rank 245
loading 8 zero partition checkpoints for rank 228
loading 8 zero partition checkpoints for rank 26
loading 8 zero partition checkpoints for rank 30
loading 8 zero partition checkpoints for rank 19
loading 8 zero partition checkpoints for rank 15
loading 8 zero partition checkpoints for rank 7
loading 8 zero partition checkpoints for rank 238
loading 8 zero partition checkpoints for rank 17
loading 8 zero partition checkpoints for rank 31
loading 8 zero partition checkpoints for rank 255
loading 8 zero partition checkpoints for rank 12
loading 8 zero partition checkpoints for rank 237
loading 8 zero partition checkpoints for rank 16
loading 8 zero partition checkpoints for rank 254
loading 8 zero partition checkpoints for rank 230
loading 8 zero partition checkpoints for rank 5
loading 8 zero partition checkpoints for rank 25
loading 8 zero partition checkpoints for rank 226
loading 8 zero partition checkpoints for rank 227
  successfully loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints at iteration 6210
time (ms) | load-checkpoint: 56578.08
[after model, optimizer, and learning rate scheduler are built] datetime: 2021-09-27 17:45:07 
> building train, validation, and test datasets ...
 > datasets target sizes (minimum size):
    train:      300000000
    validation: 1638400
    test:       10240
> building train, validation, and test datasets for GPT ...
 > building dataset index ...
    reading sizes...
    reading pointers...
    reading document index...
    creating numpy buffer of mmap...
    creating memory view of numpy buffer...
 > finished creating indexed dataset in 0.174718 seconds
    number of documents: 304230423
 > dataset split:
    train:
     document indices in [0, 288714672) total of 288714672 documents
    validation:
     document indices in [288714672, 303926193) total of 15211521 documents
    test:
     document indices in [303926193, 304230423) total of 304230 documents
 > WARNING: could not find index map files, building the indices on rank 0 ...
 > last epoch number of samples (36925554) is smaller than 80% of number of samples per epoch (131537223), setting separate_last_epoch to True
WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-27 17:46:37 CEST)" was missed by 0:00:21.460713
 > elasped time to build and save doc-idx mapping (seconds): 74.353737
    using:
     number of documents:       288714672
     number of epochs:          3
     sequence length:           2048
     total number of samples:   394611669
WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-27 17:47:37 CEST)" was missed by 0:00:11.662010
 > elasped time to build and save sample-idx mapping (seconds): 24.775998
 > building shuffle index with split [0, 263074446) and [263074446, 394611669) ...
 > elasped time to build and save shuffle-idx mapping (seconds): 26.026031
 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_300000000ns_2048sl_43s_doc_idx.npy
 > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_300000000ns_2048sl_43s_sample_idx.npy
 > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_300000000ns_2048sl_43s_shuffle_idx.npy
    loaded indexed file in 0.089 seconds
    total number of samples: 394611670
    total number of epochs: 3
 > WARNING: could not find index map files, building the indices on rank 0 ...
 > only one epoch required, setting separate_last_epoch to False
 > elasped time to build and save doc-idx mapping (seconds): 0.979826
    using:
     number of documents:       15211521
     number of epochs:          1
     sequence length:           2048
     total number of samples:   6927160
 > elasped time to build and save sample-idx mapping (seconds): 0.364344
 > building shuffle index with split [0, 6927160) and [6927160, 6927160) ...
 > elasped time to build and save shuffle-idx mapping (seconds): 0.312714
 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_1638400ns_2048sl_43s_doc_idx.npy
 > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_1638400ns_2048sl_43s_sample_idx.npy
 > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_1638400ns_2048sl_43s_shuffle_idx.npy
    loaded indexed file in 0.034 seconds
    total number of samples: 6927161
    total number of epochs: 1
 > WARNING: could not find index map files, building the indices on rank 0 ...
 > only one epoch required, setting separate_last_epoch to False
 > elasped time to build and save doc-idx mapping (seconds): 0.019056
    using:
     number of documents:       304230
     number of epochs:          1
     sequence length:           2048
     total number of samples:   137383
 > elasped time to build and save sample-idx mapping (seconds): 0.007505
 > building shuffle index with split [0, 137383) and [137383, 137383) ...
 > elasped time to build and save shuffle-idx mapping (seconds): 0.021865
 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_43s_doc_idx.npy
 > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_43s_sample_idx.npy
 > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_43s_shuffle_idx.npy
    loaded indexed file in 0.110 seconds
    total number of samples: 137384
    total number of epochs: 1
> finished creating GPT datasets ...
[after dataloaders are built] datetime: 2021-09-27 17:47:20 
done with setup ...
training ...
time (ms) | model-and-optimizer-setup: 64587.82 | train/valid/test-data-iterators-setup: 131511.20
[before the start of training step] datetime: 2021-09-27 17:47:20 
[2021-09-27 17:47:20,277] [INFO] [checkpointing.py:408:forward] Activation Checkpointing Information
[2021-09-27 17:47:20,277] [INFO] [checkpointing.py:409:forward] ----Partition Activations False, CPU CHECKPOINTING False
[2021-09-27 17:47:20,277] [INFO] [checkpointing.py:412:forward] ----contiguous Memory Checkpointing False with 32 total layers
[2021-09-27 17:47:20,277] [INFO] [checkpointing.py:415:forward] ----Synchronization False
[2021-09-27 17:47:20,277] [INFO] [checkpointing.py:416:forward] ----Profiling time in checkpointing False
[Rank 225] (after 6220 iterations) memory (MB) | allocated: 7107.7119140625 | max allocated: 11885.68798828125 | reserved: 22492.0 | max reserved: 22492.0
[Rank 226] (after 6220 iterations) memory (MB) | allocated: 7107.7119140625 | max allocated: 11885.6884765625 | reserved: 22492.0 | max reserved: 22492.0
[Rank 1] (after 6220 iterations) memory (MB) | allocated: 6689.83056640625 | max allocated: 13899.01416015625 | reserved: 23278.0 | max reserved: 23278.0
[Rank 2] (after 6220 iterations) memory (MB) | allocated: 6689.83056640625 | max allocated: 13899.01416015625 | reserved: 23278.0 | max reserved: 23278.0
[Rank 0] (after 6220 iterations) memory (MB) | allocated: 6689.83056640625 | max allocated: 13899.01416015625 | reserved: 23246.0 | max reserved: 23246.0
[Rank 224] (after 6220 iterations) memory (MB) | allocated: 7107.7119140625 | max allocated: 11885.68994140625 | reserved: 22492.0 | max reserved: 22492.0
[Rank 227] (after 6220 iterations) memory (MB) | allocated: 7107.7119140625 | max allocated: 11885.6884765625 | reserved: 21700.0 | max reserved: 21700.0
 iteration     6220/  159576 | consumed samples:       194400 | elapsed time per iteration (ms): 19180.4 | learning rate: 5.378E-05 | global batch size:    80 | lm loss: 6.355129E+00 | loss scale: 4096.0 | grad norm: 93535.397 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[Rank 3] (after 6220 iterations) memory (MB) | allocated: 6689.83056640625 | max allocated: 13899.01416015625 | reserved: 23278.0 | max reserved: 23278.0
[Rank 33] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 12082.4677734375 | reserved: 20130.0 | max reserved: 20130.0
[Rank 66] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11810.46728515625 | reserved: 19950.0 | max reserved: 19950.0
[Rank 34] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 12082.4677734375 | reserved: 20250.0 | max reserved: 20250.0
[Rank 98] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11538.466796875 | reserved: 19886.0 | max reserved: 19886.0
[Rank 130] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11266.46630859375 | reserved: 19338.0 | max reserved: 19338.0
[Rank 97] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11538.466796875 | reserved: 19402.0 | max reserved: 19402.0
[Rank 161] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 10994.4658203125 | reserved: 20170.0 | max reserved: 20170.0
[Rank 129] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11266.46630859375 | reserved: 19050.0 | max reserved: 19050.0
[Rank 193] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 10722.46533203125 | reserved: 18826.0 | max reserved: 18826.0
[Rank 65] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11810.46728515625 | reserved: 19582.0 | max reserved: 19582.0
[Rank 194] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 10722.46533203125 | reserved: 18970.0 | max reserved: 18970.0
[Rank 162] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 10994.4658203125 | reserved: 19146.0 | max reserved: 19146.0
[Rank 32] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 12082.4677734375 | reserved: 20676.0 | max reserved: 20676.0
[Rank 96] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11538.466796875 | reserved: 20296.0 | max reserved: 20296.0
[Rank 64] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11810.46728515625 | reserved: 20392.0 | max reserved: 20392.0
[Rank 35] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 12082.4677734375 | reserved: 20030.0 | max reserved: 20030.0
[Rank 160] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 10994.4658203125 | reserved: 19636.0 | max reserved: 19636.0
[Rank 192] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 10722.46533203125 | reserved: 19012.0 | max reserved: 19012.0
[Rank 128] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11266.46630859375 | reserved: 20008.0 | max reserved: 20008.0
[Rank 99] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11538.466796875 | reserved: 19870.0 | max reserved: 19870.0
[Rank 67] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11810.46728515625 | reserved: 19582.0 | max reserved: 19582.0
[Rank 131] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11266.46630859375 | reserved: 19278.0 | max reserved: 19278.0
[Rank 195] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 10722.46533203125 | reserved: 18970.0 | max reserved: 18970.0
[Rank 163] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 10994.4658203125 | reserved: 18826.0 | max reserved: 18826.0
 iteration     6230/  159576 | consumed samples:       195200 | elapsed time per iteration (ms): 17628.9 | learning rate: 5.400E-05 | global batch size:    80 | lm loss: 6.325471E+00 | loss scale: 4096.0 | grad norm: 104626.566 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6240/  159576 | consumed samples:       196000 | elapsed time per iteration (ms): 17585.3 | learning rate: 5.423E-05 | global batch size:    80 | lm loss: 6.313773E+00 | loss scale: 4096.0 | grad norm: 104488.785 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6250/  159576 | consumed samples:       196800 | elapsed time per iteration (ms): 17683.9 | learning rate: 5.445E-05 | global batch size:    80 | lm loss: 6.302388E+00 | loss scale: 4096.0 | grad norm: 99404.120 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6260/  159576 | consumed samples:       197600 | elapsed time per iteration (ms): 17834.3 | learning rate: 5.467E-05 | global batch size:    80 | lm loss: 6.322264E+00 | loss scale: 4096.0 | grad norm: 134601.608 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6270/  159576 | consumed samples:       198400 | elapsed time per iteration (ms): 17647.5 | learning rate: 5.489E-05 | global batch size:    80 | lm loss: 6.319476E+00 | loss scale: 4096.0 | grad norm: 142879.794 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6280/  159576 | consumed samples:       199200 | elapsed time per iteration (ms): 17607.4 | learning rate: 5.511E-05 | global batch size:    80 | lm loss: 6.321982E+00 | loss scale: 4096.0 | grad norm: 114136.314 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6290/  159576 | consumed samples:       200000 | elapsed time per iteration (ms): 17636.6 | learning rate: 5.534E-05 | global batch size:    80 | lm loss: 6.272703E+00 | loss scale: 4096.0 | grad norm: 101011.949 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6300/  159576 | consumed samples:       200800 | elapsed time per iteration (ms): 17537.9 | learning rate: 5.556E-05 | global batch size:    80 | lm loss: 6.295881E+00 | loss scale: 4096.0 | grad norm: 116874.031 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6310/  159576 | consumed samples:       201600 | elapsed time per iteration (ms): 17634.4 | learning rate: 5.578E-05 | global batch size:    80 | lm loss: 6.324175E+00 | loss scale: 4096.0 | grad norm: 115938.037 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6320/  159576 | consumed samples:       202400 | elapsed time per iteration (ms): 17796.6 | learning rate: 5.600E-05 | global batch size:    80 | lm loss: 6.301260E+00 | loss scale: 4096.0 | grad norm: 128639.863 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6330/  159576 | consumed samples:       203200 | elapsed time per iteration (ms): 17684.4 | learning rate: 5.622E-05 | global batch size:    80 | lm loss: 6.325212E+00 | loss scale: 4096.0 | grad norm: 122331.136 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6340/  159576 | consumed samples:       204000 | elapsed time per iteration (ms): 17751.1 | learning rate: 5.645E-05 | global batch size:    80 | lm loss: 6.315152E+00 | loss scale: 4096.0 | grad norm: 107257.166 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-27 18:28:25] PULSE: tr8-104B is running for 44:59 since 2021-09-27T17:43:26 (1271196 on 'gpu_p13' partition (r7i7n[6-8],r8i0n[0-8],r8i1n[0-4],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n0,r9i4n8,r9i5n[0-8],r9i6n[0-8],r9i7n[3-6])
 iteration     6350/  159576 | consumed samples:       204800 | elapsed time per iteration (ms): 17472.1 | learning rate: 5.667E-05 | global batch size:    80 | lm loss: 6.305837E+00 | loss scale: 4096.0 | grad norm: 92922.842 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6360/  159576 | consumed samples:       205600 | elapsed time per iteration (ms): 17585.4 | learning rate: 5.689E-05 | global batch size:    80 | lm loss: 6.291708E+00 | loss scale: 4096.0 | grad norm: 128015.015 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6370/  159576 | consumed samples:       206400 | elapsed time per iteration (ms): 17756.4 | learning rate: 5.711E-05 | global batch size:    80 | lm loss: 6.336868E+00 | loss scale: 4096.0 | grad norm: 132675.737 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6380/  159576 | consumed samples:       207200 | elapsed time per iteration (ms): 17470.3 | learning rate: 5.733E-05 | global batch size:    80 | lm loss: 6.319473E+00 | loss scale: 4096.0 | grad norm: 121903.409 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6390/  159576 | consumed samples:       208000 | elapsed time per iteration (ms): 17849.6 | learning rate: 5.755E-05 | global batch size:    80 | lm loss: 6.295473E+00 | loss scale: 4096.0 | grad norm: 108842.830 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6400/  159576 | consumed samples:       208800 | elapsed time per iteration (ms): 17525.6 | learning rate: 5.778E-05 | global batch size:    80 | lm loss: 6.305953E+00 | loss scale: 4096.0 | grad norm: 110142.091 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6410/  159576 | consumed samples:       209600 | elapsed time per iteration (ms): 17695.6 | learning rate: 5.800E-05 | global batch size:    80 | lm loss: 6.327058E+00 | loss scale: 4096.0 | grad norm: 149204.626 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6420/  159576 | consumed samples:       210400 | elapsed time per iteration (ms): 17590.8 | learning rate: 5.822E-05 | global batch size:    80 | lm loss: 6.301820E+00 | loss scale: 4096.0 | grad norm: 90947.052 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6430/  159576 | consumed samples:       211200 | elapsed time per iteration (ms): 17793.7 | learning rate: 5.844E-05 | global batch size:    80 | lm loss: 6.343626E+00 | loss scale: 4096.0 | grad norm: 345234.052 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6440/  159576 | consumed samples:       212000 | elapsed time per iteration (ms): 17631.2 | learning rate: 5.866E-05 | global batch size:    80 | lm loss: 6.323440E+00 | loss scale: 4096.0 | grad norm: 96087.714 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6450/  159576 | consumed samples:       212800 | elapsed time per iteration (ms): 17688.1 | learning rate: 5.889E-05 | global batch size:    80 | lm loss: 6.310754E+00 | loss scale: 4096.0 | grad norm: 142702.659 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6460/  159576 | consumed samples:       213600 | elapsed time per iteration (ms): 17884.9 | learning rate: 5.911E-05 | global batch size:    80 | lm loss: 6.326996E+00 | loss scale: 4096.0 | grad norm: 139353.302 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6470/  159576 | consumed samples:       214400 | elapsed time per iteration (ms): 17777.5 | learning rate: 5.933E-05 | global batch size:    80 | lm loss: 6.303541E+00 | loss scale: 4096.0 | grad norm: 163735.847 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6480/  159576 | consumed samples:       215200 | elapsed time per iteration (ms): 17758.4 | learning rate: 5.955E-05 | global batch size:    80 | lm loss: 6.318764E+00 | loss scale: 4096.0 | grad norm: 122570.514 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6490/  159576 | consumed samples:       216000 | elapsed time per iteration (ms): 17864.1 | learning rate: 5.977E-05 | global batch size:    80 | lm loss: 6.307048E+00 | loss scale: 4096.0 | grad norm: 116946.724 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6500/  159576 | consumed samples:       216800 | elapsed time per iteration (ms): 17901.7 | learning rate: 6.000E-05 | global batch size:    80 | lm loss: 6.315722E+00 | loss scale: 4096.0 | grad norm: 93922.032 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6510/  159576 | consumed samples:       217600 | elapsed time per iteration (ms): 17582.8 | learning rate: 6.000E-05 | global batch size:    80 | lm loss: 6.323491E+00 | loss scale: 4096.0 | grad norm: 148357.794 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6520/  159576 | consumed samples:       218400 | elapsed time per iteration (ms): 17725.2 | learning rate: 6.000E-05 | global batch size:    80 | lm loss: 6.330975E+00 | loss scale: 4096.0 | grad norm: 103909.494 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6530/  159576 | consumed samples:       219200 | elapsed time per iteration (ms): 17788.4 | learning rate: 6.000E-05 | global batch size:    80 | lm loss: 6.330465E+00 | loss scale: 4096.0 | grad norm: 112690.620 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6540/  159576 | consumed samples:       220000 | elapsed time per iteration (ms): 17722.2 | learning rate: 6.000E-05 | global batch size:    80 | lm loss: 6.325342E+00 | loss scale: 4096.0 | grad norm: 74738.856 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6550/  159576 | consumed samples:       220800 | elapsed time per iteration (ms): 17778.1 | learning rate: 6.000E-05 | global batch size:    80 | lm loss: 6.338161E+00 | loss scale: 4096.0 | grad norm: 92386.024 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-27 19:28:18] PULSE: tr8-104B is running for 1:44:52 since 2021-09-27T17:43:26 (1271196 on 'gpu_p13' partition (r7i7n[6-8],r8i0n[0-8],r8i1n[0-4],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n0,r9i4n8,r9i5n[0-8],r9i6n[0-8],r9i7n[3-6])
 iteration     6560/  159576 | consumed samples:       221600 | elapsed time per iteration (ms): 17633.8 | learning rate: 6.000E-05 | global batch size:    80 | lm loss: 6.346842E+00 | loss scale: 4096.0 | grad norm: 91412.181 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6570/  159576 | consumed samples:       222400 | elapsed time per iteration (ms): 17585.9 | learning rate: 6.000E-05 | global batch size:    80 | lm loss: 6.354182E+00 | loss scale: 4096.0 | grad norm: 106016.821 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6580/  159576 | consumed samples:       223200 | elapsed time per iteration (ms): 17723.8 | learning rate: 6.000E-05 | global batch size:    80 | lm loss: 6.339022E+00 | loss scale: 4096.0 | grad norm: 99292.123 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6590/  159576 | consumed samples:       224000 | elapsed time per iteration (ms): 17636.7 | learning rate: 6.000E-05 | global batch size:    80 | lm loss: 6.343359E+00 | loss scale: 4096.0 | grad norm: 142334.413 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6600/  159576 | consumed samples:       224800 | elapsed time per iteration (ms): 17663.9 | learning rate: 6.000E-05 | global batch size:    80 | lm loss: 6.340461E+00 | loss scale: 4096.0 | grad norm: 152141.320 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6610/  159576 | consumed samples:       225600 | elapsed time per iteration (ms): 17548.3 | learning rate: 6.000E-05 | global batch size:    80 | lm loss: 6.323914E+00 | loss scale: 4096.0 | grad norm: 170495.198 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6620/  159576 | consumed samples:       226400 | elapsed time per iteration (ms): 17566.2 | learning rate: 6.000E-05 | global batch size:    80 | lm loss: 6.304215E+00 | loss scale: 4096.0 | grad norm: 160242.764 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6630/  159576 | consumed samples:       227200 | elapsed time per iteration (ms): 17951.1 | learning rate: 6.000E-05 | global batch size:    80 | lm loss: 6.312865E+00 | loss scale: 4096.0 | grad norm: 104923.640 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6640/  159576 | consumed samples:       228000 | elapsed time per iteration (ms): 17693.7 | learning rate: 6.000E-05 | global batch size:    80 | lm loss: 6.337115E+00 | loss scale: 4096.0 | grad norm: 162544.865 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6650/  159576 | consumed samples:       228800 | elapsed time per iteration (ms): 17707.3 | learning rate: 6.000E-05 | global batch size:    80 | lm loss: 6.327879E+00 | loss scale: 4096.0 | grad norm: 80497.049 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6660/  159576 | consumed samples:       229600 | elapsed time per iteration (ms): 17584.5 | learning rate: 6.000E-05 | global batch size:    80 | lm loss: 6.404206E+00 | loss scale: 4096.0 | grad norm: 136886.090 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6670/  159576 | consumed samples:       230400 | elapsed time per iteration (ms): 17615.2 | learning rate: 6.000E-05 | global batch size:    80 | lm loss: 6.359778E+00 | loss scale: 4096.0 | grad norm: 123501.796 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6680/  159576 | consumed samples:       231200 | elapsed time per iteration (ms): 17812.0 | learning rate: 6.000E-05 | global batch size:    80 | lm loss: 6.318851E+00 | loss scale: 4096.0 | grad norm: 118146.851 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6690/  159576 | consumed samples:       232000 | elapsed time per iteration (ms): 17690.8 | learning rate: 6.000E-05 | global batch size:    80 | lm loss: 6.324978E+00 | loss scale: 4096.0 | grad norm: 127513.155 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6700/  159576 | consumed samples:       232800 | elapsed time per iteration (ms): 17679.3 | learning rate: 6.000E-05 | global batch size:    80 | lm loss: 6.312429E+00 | loss scale: 4096.0 | grad norm: 141251.517 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6710/  159576 | consumed samples:       233600 | elapsed time per iteration (ms): 17730.1 | learning rate: 6.000E-05 | global batch size:    80 | lm loss: 6.304575E+00 | loss scale: 8192.0 | grad norm: 354806.488 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6720/  159576 | consumed samples:       234400 | elapsed time per iteration (ms): 17817.5 | learning rate: 6.000E-05 | global batch size:    80 | lm loss: 6.343853E+00 | loss scale: 8192.0 | grad norm: 400003.537 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6730/  159576 | consumed samples:       235200 | elapsed time per iteration (ms): 17886.0 | learning rate: 6.000E-05 | global batch size:    80 | lm loss: 6.329220E+00 | loss scale: 8192.0 | grad norm: 354798.775 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6740/  159576 | consumed samples:       236000 | elapsed time per iteration (ms): 17869.3 | learning rate: 6.000E-05 | global batch size:    80 | lm loss: 6.341031E+00 | loss scale: 8192.0 | grad norm: 452433.886 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6750/  159576 | consumed samples:       236912 | elapsed time per iteration (ms): 18328.8 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 6.325079E+00 | loss scale: 8192.0 | grad norm: 272354.067 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6760/  159576 | consumed samples:       237872 | elapsed time per iteration (ms): 17158.6 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 6.350076E+00 | loss scale: 4096.0 | grad norm: 109464.543 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-27 20:32:07] PULSE: tr8-104B is running for 2:48:41 since 2021-09-27T17:43:26 (1271196 on 'gpu_p13' partition (r7i7n[6-8],r8i0n[0-8],r8i1n[0-4],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n0,r9i4n8,r9i5n[0-8],r9i6n[0-8],r9i7n[3-6])
 iteration     6770/  159576 | consumed samples:       238832 | elapsed time per iteration (ms): 18779.7 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 6.347258E+00 | loss scale: 4096.0 | grad norm: 151362.578 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6780/  159576 | consumed samples:       239792 | elapsed time per iteration (ms): 18764.2 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 6.483617E+00 | loss scale: 4096.0 | grad norm: 144409.728 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6790/  159576 | consumed samples:       240752 | elapsed time per iteration (ms): 18830.0 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 6.459402E+00 | loss scale: 4096.0 | grad norm: 106762.239 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6800/  159576 | consumed samples:       241712 | elapsed time per iteration (ms): 18594.7 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 6.457979E+00 | loss scale: 4096.0 | grad norm: 159826.924 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6810/  159576 | consumed samples:       242672 | elapsed time per iteration (ms): 18590.0 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 6.445743E+00 | loss scale: 4096.0 | grad norm: 104586.355 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6820/  159576 | consumed samples:       243632 | elapsed time per iteration (ms): 18726.4 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 6.371418E+00 | loss scale: 4096.0 | grad norm: 181059.362 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6830/  159576 | consumed samples:       244592 | elapsed time per iteration (ms): 18734.3 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 6.385859E+00 | loss scale: 4096.0 | grad norm: 126958.593 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6840/  159576 | consumed samples:       245552 | elapsed time per iteration (ms): 18634.7 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 6.351850E+00 | loss scale: 4096.0 | grad norm: 154126.591 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6850/  159576 | consumed samples:       246512 | elapsed time per iteration (ms): 18587.1 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 6.341198E+00 | loss scale: 4096.0 | grad norm: 133262.949 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6860/  159576 | consumed samples:       247472 | elapsed time per iteration (ms): 19013.7 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 6.317137E+00 | loss scale: 4096.0 | grad norm: 101860.571 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6870/  159576 | consumed samples:       248432 | elapsed time per iteration (ms): 18789.2 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 6.332655E+00 | loss scale: 4096.0 | grad norm: 467416.787 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6880/  159576 | consumed samples:       249392 | elapsed time per iteration (ms): 18654.5 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 6.385090E+00 | loss scale: 4096.0 | grad norm: 154062.615 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6890/  159576 | consumed samples:       250352 | elapsed time per iteration (ms): 18644.4 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 6.355402E+00 | loss scale: 4096.0 | grad norm: 154349.296 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6900/  159576 | consumed samples:       251312 | elapsed time per iteration (ms): 18495.6 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 6.365808E+00 | loss scale: 4096.0 | grad norm: 95313.572 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6910/  159576 | consumed samples:       252272 | elapsed time per iteration (ms): 18802.1 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 6.598378E+00 | loss scale: 4096.0 | grad norm: 84678.880 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6920/  159576 | consumed samples:       253232 | elapsed time per iteration (ms): 18641.0 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 7.314456E+00 | loss scale: 4096.0 | grad norm: 122716.232 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6930/  159576 | consumed samples:       254192 | elapsed time per iteration (ms): 18564.1 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 9.121927E+00 | loss scale: 4096.0 | grad norm: 283384.130 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6940/  159576 | consumed samples:       255152 | elapsed time per iteration (ms): 18549.7 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 1.023865E+01 | loss scale: 4096.0 | grad norm: 42359.376 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6950/  159576 | consumed samples:       256112 | elapsed time per iteration (ms): 17675.8 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 9.249577E+00 | loss scale: 2048.0 | grad norm: 78368.205 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6960/  159576 | consumed samples:       257072 | elapsed time per iteration (ms): 18443.5 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 8.389180E+00 | loss scale: 2048.0 | grad norm: 40490.259 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6970/  159576 | consumed samples:       258032 | elapsed time per iteration (ms): 18545.1 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 7.529938E+00 | loss scale: 2048.0 | grad norm: 14218.251 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-27 21:35:01] PULSE: tr8-104B is running for 3:51:35 since 2021-09-27T17:43:26 (1271196 on 'gpu_p13' partition (r7i7n[6-8],r8i0n[0-8],r8i1n[0-4],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n0,r9i4n8,r9i5n[0-8],r9i6n[0-8],r9i7n[3-6])
 iteration     6980/  159576 | consumed samples:       258992 | elapsed time per iteration (ms): 18379.4 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 7.102215E+00 | loss scale: 2048.0 | grad norm: 18580.148 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6990/  159576 | consumed samples:       259952 | elapsed time per iteration (ms): 18355.5 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 7.018941E+00 | loss scale: 2048.0 | grad norm: 17882.180 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7000/  159576 | consumed samples:       260912 | elapsed time per iteration (ms): 18505.9 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 6.942125E+00 | loss scale: 2048.0 | grad norm: 26860.562 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
------------------------------------------------------------------------------------------------
 validation loss at iteration 7000 | lm loss value: 6.872679E+00 | lm loss PPL: 9.655315E+02 | 
------------------------------------------------------------------------------------------------
 iteration     7010/  159576 | consumed samples:       261872 | elapsed time per iteration (ms): 19970.7 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 6.816376E+00 | loss scale: 2048.0 | grad norm: 40294.075 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7020/  159576 | consumed samples:       262832 | elapsed time per iteration (ms): 18648.1 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 6.821559E+00 | loss scale: 2048.0 | grad norm: 25012.263 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7030/  159576 | consumed samples:       263792 | elapsed time per iteration (ms): 18478.0 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 6.893867E+00 | loss scale: 2048.0 | grad norm: 39565.380 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7040/  159576 | consumed samples:       264752 | elapsed time per iteration (ms): 18670.1 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 6.871474E+00 | loss scale: 2048.0 | grad norm: 22832.888 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7050/  159576 | consumed samples:       265712 | elapsed time per iteration (ms): 18521.8 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 6.875928E+00 | loss scale: 2048.0 | grad norm: 26237.022 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7060/  159576 | consumed samples:       266672 | elapsed time per iteration (ms): 18543.5 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 6.827568E+00 | loss scale: 2048.0 | grad norm: 31639.445 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7070/  159576 | consumed samples:       267632 | elapsed time per iteration (ms): 18564.4 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 6.711889E+00 | loss scale: 2048.0 | grad norm: 46310.481 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7080/  159576 | consumed samples:       268592 | elapsed time per iteration (ms): 18629.8 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 6.683693E+00 | loss scale: 2048.0 | grad norm: 31484.550 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7090/  159576 | consumed samples:       269552 | elapsed time per iteration (ms): 18473.8 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 6.627121E+00 | loss scale: 2048.0 | grad norm: 45017.258 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7100/  159576 | consumed samples:       270512 | elapsed time per iteration (ms): 18806.7 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 6.627071E+00 | loss scale: 2048.0 | grad norm: 57880.707 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7110/  159576 | consumed samples:       271472 | elapsed time per iteration (ms): 18537.3 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 6.608931E+00 | loss scale: 2048.0 | grad norm: 67724.648 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7120/  159576 | consumed samples:       272432 | elapsed time per iteration (ms): 18556.3 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 6.592625E+00 | loss scale: 2048.0 | grad norm: 67655.063 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7130/  159576 | consumed samples:       273392 | elapsed time per iteration (ms): 18620.1 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 6.769730E+00 | loss scale: 2048.0 | grad norm: 50594.550 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7140/  159576 | consumed samples:       274352 | elapsed time per iteration (ms): 18517.9 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 6.749163E+00 | loss scale: 2048.0 | grad norm: 30940.535 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7150/  159576 | consumed samples:       275312 | elapsed time per iteration (ms): 18726.4 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 6.695554E+00 | loss scale: 2048.0 | grad norm: 49756.042 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-27 22:31:42] PULSE: tr8-104B is running for 4:48:16 since 2021-09-27T17:43:26 (1271196 on 'gpu_p13' partition (r7i7n[6-8],r8i0n[0-8],r8i1n[0-4],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n0,r9i4n8,r9i5n[0-8],r9i6n[0-8],r9i7n[3-6])
 iteration     7160/  159576 | consumed samples:       276272 | elapsed time per iteration (ms): 18567.4 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 6.630823E+00 | loss scale: 2048.0 | grad norm: 46573.225 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7170/  159576 | consumed samples:       277232 | elapsed time per iteration (ms): 18787.6 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 6.637067E+00 | loss scale: 2048.0 | grad norm: 47650.692 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7180/  159576 | consumed samples:       278192 | elapsed time per iteration (ms): 18669.9 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 6.663966E+00 | loss scale: 2048.0 | grad norm: 54677.698 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7190/  159576 | consumed samples:       279152 | elapsed time per iteration (ms): 18711.4 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 6.603532E+00 | loss scale: 2048.0 | grad norm: 75914.515 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7200/  159576 | consumed samples:       280112 | elapsed time per iteration (ms): 18682.4 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 6.571133E+00 | loss scale: 2048.0 | grad norm: 74379.166 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7210/  159576 | consumed samples:       281072 | elapsed time per iteration (ms): 18622.6 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 6.584048E+00 | loss scale: 2048.0 | grad norm: 75888.414 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7220/  159576 | consumed samples:       282032 | elapsed time per iteration (ms): 18555.6 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 6.554535E+00 | loss scale: 2048.0 | grad norm: 90934.334 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7230/  159576 | consumed samples:       282992 | elapsed time per iteration (ms): 18600.5 | learning rate: 6.000E-05 | global batch size:    96 | lm loss: 6.558411E+00 | loss scale: 2048.0 | grad norm: 54832.822 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7240/  159576 | consumed samples:       284032 | elapsed time per iteration (ms): 19119.6 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 6.585645E+00 | loss scale: 2048.0 | grad norm: 116769.600 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7250/  159576 | consumed samples:       285152 | elapsed time per iteration (ms): 19421.9 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 6.554094E+00 | loss scale: 2048.0 | grad norm: 79780.312 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7260/  159576 | consumed samples:       286272 | elapsed time per iteration (ms): 19643.2 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 6.545351E+00 | loss scale: 2048.0 | grad norm: 153165.121 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7270/  159576 | consumed samples:       287392 | elapsed time per iteration (ms): 19873.2 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 6.548807E+00 | loss scale: 2048.0 | grad norm: 96725.418 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7280/  159576 | consumed samples:       288512 | elapsed time per iteration (ms): 19830.3 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 6.532312E+00 | loss scale: 2048.0 | grad norm: 85054.846 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7290/  159576 | consumed samples:       289632 | elapsed time per iteration (ms): 19469.1 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 6.535855E+00 | loss scale: 2048.0 | grad norm: 66255.480 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7300/  159576 | consumed samples:       290752 | elapsed time per iteration (ms): 19578.9 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 6.583752E+00 | loss scale: 2048.0 | grad norm: 61901.507 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7310/  159576 | consumed samples:       291872 | elapsed time per iteration (ms): 19646.2 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 6.539584E+00 | loss scale: 2048.0 | grad norm: 68238.513 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7320/  159576 | consumed samples:       292992 | elapsed time per iteration (ms): 19642.5 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 6.526649E+00 | loss scale: 2048.0 | grad norm: 69527.941 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7330/  159576 | consumed samples:       294112 | elapsed time per iteration (ms): 19508.3 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 6.514026E+00 | loss scale: 2048.0 | grad norm: 63745.755 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7340/  159576 | consumed samples:       295232 | elapsed time per iteration (ms): 19676.4 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 6.519949E+00 | loss scale: 2048.0 | grad norm: 96730.566 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-27 23:32:04] PULSE: tr8-104B is running for 5:48:38 since 2021-09-27T17:43:26 (1271196 on 'gpu_p13' partition (r7i7n[6-8],r8i0n[0-8],r8i1n[0-4],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n0,r9i4n8,r9i5n[0-8],r9i6n[0-8],r9i7n[3-6])
 iteration     7350/  159576 | consumed samples:       296352 | elapsed time per iteration (ms): 19539.0 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 6.510521E+00 | loss scale: 2048.0 | grad norm: 95201.544 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7360/  159576 | consumed samples:       297472 | elapsed time per iteration (ms): 19834.3 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 6.532115E+00 | loss scale: 2048.0 | grad norm: 269153.773 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7370/  159576 | consumed samples:       298592 | elapsed time per iteration (ms): 19564.2 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 6.501956E+00 | loss scale: 2048.0 | grad norm: 89998.728 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7380/  159576 | consumed samples:       299712 | elapsed time per iteration (ms): 19672.7 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 6.522272E+00 | loss scale: 2048.0 | grad norm: 75724.702 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7390/  159576 | consumed samples:       300832 | elapsed time per iteration (ms): 19562.0 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 6.511443E+00 | loss scale: 2048.0 | grad norm: 89537.752 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7400/  159576 | consumed samples:       301952 | elapsed time per iteration (ms): 19728.4 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 6.534271E+00 | loss scale: 2048.0 | grad norm: 79036.616 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7410/  159576 | consumed samples:       303072 | elapsed time per iteration (ms): 19731.8 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 6.550716E+00 | loss scale: 2048.0 | grad norm: 60002.009 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7420/  159576 | consumed samples:       304192 | elapsed time per iteration (ms): 19733.0 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 6.546501E+00 | loss scale: 2048.0 | grad norm: 69147.056 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7430/  159576 | consumed samples:       305312 | elapsed time per iteration (ms): 19483.2 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 6.560014E+00 | loss scale: 2048.0 | grad norm: 75450.439 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7440/  159576 | consumed samples:       306432 | elapsed time per iteration (ms): 19613.5 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 6.523249E+00 | loss scale: 2048.0 | grad norm: 104393.288 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7450/  159576 | consumed samples:       307552 | elapsed time per iteration (ms): 19763.9 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 6.510474E+00 | loss scale: 4096.0 | grad norm: 189305.762 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7460/  159576 | consumed samples:       308672 | elapsed time per iteration (ms): 19871.2 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 6.501906E+00 | loss scale: 4096.0 | grad norm: 277069.826 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7470/  159576 | consumed samples:       309792 | elapsed time per iteration (ms): 18903.0 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 6.497433E+00 | loss scale: 4096.0 | grad norm: 225644.862 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7480/  159576 | consumed samples:       310912 | elapsed time per iteration (ms): 19707.4 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 6.488033E+00 | loss scale: 4096.0 | grad norm: 230163.205 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7490/  159576 | consumed samples:       312032 | elapsed time per iteration (ms): 19720.9 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 6.505843E+00 | loss scale: 4096.0 | grad norm: 238654.612 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7500/  159576 | consumed samples:       313152 | elapsed time per iteration (ms): 18950.8 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 6.477815E+00 | loss scale: 2048.0 | grad norm: 106401.719 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
saving checkpoint at iteration    7500 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints
[2021-09-28 00:24:01,519] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/global_step7500/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration    7500 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints
time (ms) | save-checkpoint: 17115.61
 iteration     7510/  159576 | consumed samples:       314272 | elapsed time per iteration (ms): 21118.3 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 6.494813E+00 | loss scale: 2048.0 | grad norm: 111065.941 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7520/  159576 | consumed samples:       315392 | elapsed time per iteration (ms): 19805.8 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 6.508061E+00 | loss scale: 2048.0 | grad norm: 108163.665 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-28 00:32:54] PULSE: tr8-104B is running for 6:49:28 since 2021-09-27T17:43:26 (1271196 on 'gpu_p13' partition (r7i7n[6-8],r8i0n[0-8],r8i1n[0-4],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n0,r9i4n8,r9i5n[0-8],r9i6n[0-8],r9i7n[3-6])
 iteration     7530/  159576 | consumed samples:       316512 | elapsed time per iteration (ms): 19675.1 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 6.531902E+00 | loss scale: 2048.0 | grad norm: 113133.301 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7540/  159576 | consumed samples:       317632 | elapsed time per iteration (ms): 19542.7 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 6.512622E+00 | loss scale: 2048.0 | grad norm: 124840.322 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7550/  159576 | consumed samples:       318752 | elapsed time per iteration (ms): 19516.2 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 6.501436E+00 | loss scale: 2048.0 | grad norm: 133229.950 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7560/  159576 | consumed samples:       319872 | elapsed time per iteration (ms): 19503.1 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 6.490542E+00 | loss scale: 2048.0 | grad norm: 71964.190 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7570/  159576 | consumed samples:       320992 | elapsed time per iteration (ms): 19421.6 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 6.521871E+00 | loss scale: 2048.0 | grad norm: 88801.230 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7580/  159576 | consumed samples:       322112 | elapsed time per iteration (ms): 19481.2 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 6.505743E+00 | loss scale: 2048.0 | grad norm: 284454.050 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7590/  159576 | consumed samples:       323232 | elapsed time per iteration (ms): 19560.8 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 6.490807E+00 | loss scale: 2048.0 | grad norm: 110863.220 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7600/  159576 | consumed samples:       324352 | elapsed time per iteration (ms): 19566.7 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 6.490352E+00 | loss scale: 2048.0 | grad norm: 99394.185 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7610/  159576 | consumed samples:       325472 | elapsed time per iteration (ms): 19546.1 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 6.487664E+00 | loss scale: 2048.0 | grad norm: 98963.244 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7620/  159576 | consumed samples:       326592 | elapsed time per iteration (ms): 19448.4 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 6.495935E+00 | loss scale: 2048.0 | grad norm: 80186.399 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7630/  159576 | consumed samples:       327712 | elapsed time per iteration (ms): 19586.5 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 6.485136E+00 | loss scale: 2048.0 | grad norm: 90794.926 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7640/  159576 | consumed samples:       328832 | elapsed time per iteration (ms): 19579.4 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 6.484132E+00 | loss scale: 2048.0 | grad norm: 120050.606 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7650/  159576 | consumed samples:       329952 | elapsed time per iteration (ms): 19625.6 | learning rate: 6.000E-05 | global batch size:   112 | lm loss: 6.474982E+00 | loss scale: 2048.0 | grad norm: 132690.701 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7660/  159576 | consumed samples:       331120 | elapsed time per iteration (ms): 19869.8 | learning rate: 6.000E-05 | global batch size:   128 | lm loss: 6.502007E+00 | loss scale: 2048.0 | grad norm: 141077.545 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7670/  159576 | consumed samples:       332400 | elapsed time per iteration (ms): 20699.4 | learning rate: 6.000E-05 | global batch size:   128 | lm loss: 6.459695E+00 | loss scale: 2048.0 | grad norm: 170892.684 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7680/  159576 | consumed samples:       333680 | elapsed time per iteration (ms): 20602.2 | learning rate: 6.000E-05 | global batch size:   128 | lm loss: 6.471451E+00 | loss scale: 2048.0 | grad norm: 186408.144 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7690/  159576 | consumed samples:       334960 | elapsed time per iteration (ms): 20925.9 | learning rate: 6.000E-05 | global batch size:   128 | lm loss: 6.450164E+00 | loss scale: 2048.0 | grad norm: 126551.055 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7700/  159576 | consumed samples:       336240 | elapsed time per iteration (ms): 20872.8 | learning rate: 6.000E-05 | global batch size:   128 | lm loss: 6.483758E+00 | loss scale: 2048.0 | grad norm: 113828.612 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-28 01:32:21] PULSE: tr8-104B is running for 7:48:55 since 2021-09-27T17:43:26 (1271196 on 'gpu_p13' partition (r7i7n[6-8],r8i0n[0-8],r8i1n[0-4],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n0,r9i4n8,r9i5n[0-8],r9i6n[0-8],r9i7n[3-6])
 iteration     7710/  159576 | consumed samples:       337520 | elapsed time per iteration (ms): 20786.9 | learning rate: 6.000E-05 | global batch size:   128 | lm loss: 6.474139E+00 | loss scale: 2048.0 | grad norm: 92984.196 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7720/  159576 | consumed samples:       338800 | elapsed time per iteration (ms): 20911.7 | learning rate: 6.000E-05 | global batch size:   128 | lm loss: 6.465121E+00 | loss scale: 2048.0 | grad norm: 101949.520 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7730/  159576 | consumed samples:       340080 | elapsed time per iteration (ms): 20160.2 | learning rate: 6.000E-05 | global batch size:   128 | lm loss: 6.493755E+00 | loss scale: 1024.0 | grad norm: 47045.415 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7740/  159576 | consumed samples:       341360 | elapsed time per iteration (ms): 20757.9 | learning rate: 6.000E-05 | global batch size:   128 | lm loss: 6.475374E+00 | loss scale: 1024.0 | grad norm: 62044.012 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7750/  159576 | consumed samples:       342640 | elapsed time per iteration (ms): 20801.0 | learning rate: 6.000E-05 | global batch size:   128 | lm loss: 6.480064E+00 | loss scale: 1024.0 | grad norm: 55223.754 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7760/  159576 | consumed samples:       343920 | elapsed time per iteration (ms): 20712.1 | learning rate: 6.000E-05 | global batch size:   128 | lm loss: 6.477321E+00 | loss scale: 1024.0 | grad norm: 75612.351 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7770/  159576 | consumed samples:       345200 | elapsed time per iteration (ms): 20773.8 | learning rate: 6.000E-05 | global batch size:   128 | lm loss: 6.486430E+00 | loss scale: 1024.0 | grad norm: 57309.889 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7780/  159576 | consumed samples:       346480 | elapsed time per iteration (ms): 20686.3 | learning rate: 6.000E-05 | global batch size:   128 | lm loss: 6.465924E+00 | loss scale: 1024.0 | grad norm: 78208.337 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7790/  159576 | consumed samples:       347760 | elapsed time per iteration (ms): 20744.2 | learning rate: 6.000E-05 | global batch size:   128 | lm loss: 6.439983E+00 | loss scale: 1024.0 | grad norm: 85978.082 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7800/  159576 | consumed samples:       349040 | elapsed time per iteration (ms): 20858.0 | learning rate: 6.000E-05 | global batch size:   128 | lm loss: 6.466323E+00 | loss scale: 1024.0 | grad norm: 83254.794 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7810/  159576 | consumed samples:       350320 | elapsed time per iteration (ms): 20728.1 | learning rate: 6.000E-05 | global batch size:   128 | lm loss: 6.452026E+00 | loss scale: 1024.0 | grad norm: 82300.274 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7820/  159576 | consumed samples:       351600 | elapsed time per iteration (ms): 20746.4 | learning rate: 6.000E-05 | global batch size:   128 | lm loss: 6.471143E+00 | loss scale: 1024.0 | grad norm: 70196.821 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7830/  159576 | consumed samples:       352880 | elapsed time per iteration (ms): 20801.6 | learning rate: 6.000E-05 | global batch size:   128 | lm loss: 6.484294E+00 | loss scale: 1024.0 | grad norm: 52460.842 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7840/  159576 | consumed samples:       354160 | elapsed time per iteration (ms): 20885.5 | learning rate: 6.000E-05 | global batch size:   128 | lm loss: 6.492403E+00 | loss scale: 1024.0 | grad norm: 61833.655 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7850/  159576 | consumed samples:       355440 | elapsed time per iteration (ms): 20657.1 | learning rate: 6.000E-05 | global batch size:   128 | lm loss: 6.466279E+00 | loss scale: 1024.0 | grad norm: 62285.100 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7860/  159576 | consumed samples:       356720 | elapsed time per iteration (ms): 19964.7 | learning rate: 6.000E-05 | global batch size:   128 | lm loss: 6.448762E+00 | loss scale: 512.0 | grad norm: 76192.061 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7870/  159576 | consumed samples:       358000 | elapsed time per iteration (ms): 20780.6 | learning rate: 6.000E-05 | global batch size:   128 | lm loss: 6.468709E+00 | loss scale: 512.0 | grad norm: 27166.098 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7880/  159576 | consumed samples:       359280 | elapsed time per iteration (ms): 20507.3 | learning rate: 6.000E-05 | global batch size:   128 | lm loss: 6.619281E+00 | loss scale: 512.0 | grad norm: 27451.209 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-28 02:32:25] PULSE: tr8-104B is scheduled to start in 17:52:43 (at 2021-09-28T20:25:09) (1277218 on 'gpu_p13' partition)
[2021-09-28 02:32:25] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1277216 on 'gpu_p13' partition)
[2021-09-28 02:32:25] PULSE: tr8-104B is running for 8:48:59 since 2021-09-27T17:43:26 (1271196 on 'gpu_p13' partition (r7i7n[6-8],r8i0n[0-8],r8i1n[0-4],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n0,r9i4n8,r9i5n[0-8],r9i6n[0-8],r9i7n[3-6])
 iteration     7890/  159576 | consumed samples:       360560 | elapsed time per iteration (ms): 20685.2 | learning rate: 6.000E-05 | global batch size:   128 | lm loss: 6.639037E+00 | loss scale: 512.0 | grad norm: 21160.659 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7900/  159576 | consumed samples:       361840 | elapsed time per iteration (ms): 20486.0 | learning rate: 6.000E-05 | global batch size:   128 | lm loss: 7.220924E+00 | loss scale: 512.0 | grad norm: 53815.762 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7910/  159576 | consumed samples:       363120 | elapsed time per iteration (ms): 20468.9 | learning rate: 6.000E-05 | global batch size:   128 | lm loss: 7.521174E+00 | loss scale: 512.0 | grad norm: 36754.779 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7920/  159576 | consumed samples:       364400 | elapsed time per iteration (ms): 20813.9 | learning rate: 6.000E-05 | global batch size:   128 | lm loss: 7.992101E+00 | loss scale: 512.0 | grad norm: 30259.595 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7930/  159576 | consumed samples:       365680 | elapsed time per iteration (ms): 20655.4 | learning rate: 6.000E-05 | global batch size:   128 | lm loss: 7.722378E+00 | loss scale: 512.0 | grad norm: 10480.959 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7940/  159576 | consumed samples:       366960 | elapsed time per iteration (ms): 20950.6 | learning rate: 6.000E-05 | global batch size:   128 | lm loss: 7.599706E+00 | loss scale: 512.0 | grad norm: 6665.941 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7950/  159576 | consumed samples:       368240 | elapsed time per iteration (ms): 20603.6 | learning rate: 6.000E-05 | global batch size:   128 | lm loss: 7.160472E+00 | loss scale: 512.0 | grad norm: 8111.009 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7960/  159576 | consumed samples:       369520 | elapsed time per iteration (ms): 20624.6 | learning rate: 6.000E-05 | global batch size:   128 | lm loss: 7.089828E+00 | loss scale: 512.0 | grad norm: 6778.536 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7970/  159576 | consumed samples:       370800 | elapsed time per iteration (ms): 20691.0 | learning rate: 6.000E-05 | global batch size:   128 | lm loss: 7.228701E+00 | loss scale: 512.0 | grad norm: 20747.579 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7980/  159576 | consumed samples:       372080 | elapsed time per iteration (ms): 20583.0 | learning rate: 6.000E-05 | global batch size:   128 | lm loss: 7.185581E+00 | loss scale: 512.0 | grad norm: 12240.877 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7990/  159576 | consumed samples:       373360 | elapsed time per iteration (ms): 20386.0 | learning rate: 6.000E-05 | global batch size:   128 | lm loss: 7.279394E+00 | loss scale: 512.0 | grad norm: 11910.373 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-28 03:13:26,386] [INFO] [logging.py:68:log_dist] [Rank 0] step=8000, skipped=20, lr=[5.99997953854461e-05, 5.99997953854461e-05], mom=[(0.9, 0.999), (0.9, 0.999)]
 iteration     8000/  159576 | consumed samples:       374640 | elapsed time per iteration (ms): 20647.9 | learning rate: 6.000E-05 | global batch size:   128 | lm loss: 7.117274E+00 | loss scale: 512.0 | grad norm: 6363.554 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
steps: 8000 loss: 6.9948 iter time (s): 0.010 samples/sec: 12373.041
------------------------------------------------------------------------------------------------
 validation loss at iteration 8000 | lm loss value: 6.930026E+00 | lm loss PPL: 1.022520E+03 | 
------------------------------------------------------------------------------------------------
 iteration     8010/  159576 | consumed samples:       375920 | elapsed time per iteration (ms): 22250.2 | learning rate: 6.000E-05 | global batch size:   128 | lm loss: 6.907596E+00 | loss scale: 512.0 | grad norm: 5175.818 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8020/  159576 | consumed samples:       377200 | elapsed time per iteration (ms): 20702.8 | learning rate: 6.000E-05 | global batch size:   128 | lm loss: 6.903972E+00 | loss scale: 512.0 | grad norm: 8915.422 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8030/  159576 | consumed samples:       378544 | elapsed time per iteration (ms): 21181.5 | learning rate: 6.000E-05 | global batch size:   144 | lm loss: 6.942516E+00 | loss scale: 512.0 | grad norm: 8113.065 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8040/  159576 | consumed samples:       379984 | elapsed time per iteration (ms): 21914.5 | learning rate: 6.000E-05 | global batch size:   144 | lm loss: 6.923864E+00 | loss scale: 512.0 | grad norm: 19249.730 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8050/  159576 | consumed samples:       381424 | elapsed time per iteration (ms): 21865.5 | learning rate: 6.000E-05 | global batch size:   144 | lm loss: 6.876669E+00 | loss scale: 512.0 | grad norm: 7890.746 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-28 03:32:27] PULSE: tr8-104B is scheduled to start in 19:12:32 (at 2021-09-28T22:45:00) (1277218 on 'gpu_p13' partition)
[2021-09-28 03:32:27] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1277295_[1-10%1] on 'gpu_p13' partition)
[2021-09-28 03:32:27] PULSE: tr8-104B is running for 9:49:01 since 2021-09-27T17:43:26 (1271196 on 'gpu_p13' partition (r7i7n[6-8],r8i0n[0-8],r8i1n[0-4],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n0,r9i4n8,r9i5n[0-8],r9i6n[0-8],r9i7n[3-6])
 iteration     8060/  159576 | consumed samples:       382864 | elapsed time per iteration (ms): 21779.1 | learning rate: 6.000E-05 | global batch size:   144 | lm loss: 6.788055E+00 | loss scale: 512.0 | grad norm: 9618.538 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8070/  159576 | consumed samples:       384304 | elapsed time per iteration (ms): 21643.3 | learning rate: 6.000E-05 | global batch size:   144 | lm loss: 6.808229E+00 | loss scale: 512.0 | grad norm: 8857.044 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8080/  159576 | consumed samples:       385744 | elapsed time per iteration (ms): 21639.1 | learning rate: 6.000E-05 | global batch size:   144 | lm loss: 6.901846E+00 | loss scale: 512.0 | grad norm: 8983.602 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8090/  159576 | consumed samples:       387184 | elapsed time per iteration (ms): 22052.4 | learning rate: 6.000E-05 | global batch size:   144 | lm loss: 6.863363E+00 | loss scale: 512.0 | grad norm: 9399.920 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8100/  159576 | consumed samples:       388624 | elapsed time per iteration (ms): 21866.1 | learning rate: 6.000E-05 | global batch size:   144 | lm loss: 6.843295E+00 | loss scale: 512.0 | grad norm: 8690.802 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8110/  159576 | consumed samples:       390064 | elapsed time per iteration (ms): 21853.1 | learning rate: 6.000E-05 | global batch size:   144 | lm loss: 6.893594E+00 | loss scale: 512.0 | grad norm: 13780.366 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8120/  159576 | consumed samples:       391504 | elapsed time per iteration (ms): 21812.6 | learning rate: 6.000E-05 | global batch size:   144 | lm loss: 6.924708E+00 | loss scale: 512.0 | grad norm: 7097.791 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8130/  159576 | consumed samples:       392944 | elapsed time per iteration (ms): 21586.9 | learning rate: 6.000E-05 | global batch size:   144 | lm loss: 6.829758E+00 | loss scale: 512.0 | grad norm: 7266.647 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8140/  159576 | consumed samples:       394384 | elapsed time per iteration (ms): 21935.4 | learning rate: 6.000E-05 | global batch size:   144 | lm loss: 6.820535E+00 | loss scale: 512.0 | grad norm: 7758.235 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8150/  159576 | consumed samples:       395824 | elapsed time per iteration (ms): 21921.3 | learning rate: 6.000E-05 | global batch size:   144 | lm loss: 6.822125E+00 | loss scale: 512.0 | grad norm: 6965.512 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8160/  159576 | consumed samples:       397264 | elapsed time per iteration (ms): 21703.6 | learning rate: 6.000E-05 | global batch size:   144 | lm loss: 6.756792E+00 | loss scale: 512.0 | grad norm: 9871.280 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8170/  159576 | consumed samples:       398704 | elapsed time per iteration (ms): 21847.9 | learning rate: 6.000E-05 | global batch size:   144 | lm loss: 6.773450E+00 | loss scale: 512.0 | grad norm: 12746.115 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8180/  159576 | consumed samples:       400144 | elapsed time per iteration (ms): 21833.8 | learning rate: 6.000E-05 | global batch size:   144 | lm loss: 6.785934E+00 | loss scale: 512.0 | grad norm: 5598.866 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8190/  159576 | consumed samples:       401584 | elapsed time per iteration (ms): 21797.4 | learning rate: 6.000E-05 | global batch size:   144 | lm loss: 6.870234E+00 | loss scale: 512.0 | grad norm: 6782.384 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8200/  159576 | consumed samples:       403024 | elapsed time per iteration (ms): 21810.1 | learning rate: 6.000E-05 | global batch size:   144 | lm loss: 6.838039E+00 | loss scale: 512.0 | grad norm: 9577.527 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8210/  159576 | consumed samples:       404464 | elapsed time per iteration (ms): 21905.3 | learning rate: 6.000E-05 | global batch size:   144 | lm loss: 6.807652E+00 | loss scale: 512.0 | grad norm: 11918.248 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-28 04:33:02] PULSE: tr8-104B is scheduled to start in 18:11:57 (at 2021-09-28T22:45:00) (1277218 on 'gpu_p13' partition)
[2021-09-28 04:33:02] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1277295_[1-10%1] on 'gpu_p13' partition)
[2021-09-28 04:33:02] PULSE: tr8-104B is running for 10:49:36 since 2021-09-27T17:43:26 (1271196 on 'gpu_p13' partition (r7i7n[6-8],r8i0n[0-8],r8i1n[0-4],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n0,r9i4n8,r9i5n[0-8],r9i6n[0-8],r9i7n[3-6])
 iteration     8220/  159576 | consumed samples:       405904 | elapsed time per iteration (ms): 21977.1 | learning rate: 6.000E-05 | global batch size:   144 | lm loss: 6.819595E+00 | loss scale: 512.0 | grad norm: 6882.121 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8230/  159576 | consumed samples:       407344 | elapsed time per iteration (ms): 21630.3 | learning rate: 6.000E-05 | global batch size:   144 | lm loss: 6.880849E+00 | loss scale: 512.0 | grad norm: 17414.946 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8240/  159576 | consumed samples:       408784 | elapsed time per iteration (ms): 21894.4 | learning rate: 6.000E-05 | global batch size:   144 | lm loss: 6.930541E+00 | loss scale: 512.0 | grad norm: 7836.035 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8250/  159576 | consumed samples:       410224 | elapsed time per iteration (ms): 21731.4 | learning rate: 6.000E-05 | global batch size:   144 | lm loss: 6.906449E+00 | loss scale: 512.0 | grad norm: 7978.667 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8260/  159576 | consumed samples:       411664 | elapsed time per iteration (ms): 21776.5 | learning rate: 6.000E-05 | global batch size:   144 | lm loss: 6.893109E+00 | loss scale: 512.0 | grad norm: 9114.270 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8270/  159576 | consumed samples:       413104 | elapsed time per iteration (ms): 22166.2 | learning rate: 6.000E-05 | global batch size:   144 | lm loss: 6.885992E+00 | loss scale: 512.0 | grad norm: 13085.411 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8280/  159576 | consumed samples:       414544 | elapsed time per iteration (ms): 21762.3 | learning rate: 6.000E-05 | global batch size:   144 | lm loss: 6.789729E+00 | loss scale: 512.0 | grad norm: 11443.626 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8290/  159576 | consumed samples:       415984 | elapsed time per iteration (ms): 21743.1 | learning rate: 6.000E-05 | global batch size:   144 | lm loss: 6.784861E+00 | loss scale: 512.0 | grad norm: 10437.240 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8300/  159576 | consumed samples:       417424 | elapsed time per iteration (ms): 21878.0 | learning rate: 6.000E-05 | global batch size:   144 | lm loss: 6.831153E+00 | loss scale: 512.0 | grad norm: 6842.857 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8310/  159576 | consumed samples:       418864 | elapsed time per iteration (ms): 21680.7 | learning rate: 6.000E-05 | global batch size:   144 | lm loss: 6.847891E+00 | loss scale: 512.0 | grad norm: 8236.158 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8320/  159576 | consumed samples:       420304 | elapsed time per iteration (ms): 21650.4 | learning rate: 6.000E-05 | global batch size:   144 | lm loss: 6.831273E+00 | loss scale: 512.0 | grad norm: 10757.345 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8330/  159576 | consumed samples:       421744 | elapsed time per iteration (ms): 21761.1 | learning rate: 6.000E-05 | global batch size:   144 | lm loss: 6.866577E+00 | loss scale: 512.0 | grad norm: 9414.173 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8340/  159576 | consumed samples:       423184 | elapsed time per iteration (ms): 22000.3 | learning rate: 6.000E-05 | global batch size:   144 | lm loss: 6.927114E+00 | loss scale: 512.0 | grad norm: 22264.468 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8350/  159576 | consumed samples:       424624 | elapsed time per iteration (ms): 21732.0 | learning rate: 6.000E-05 | global batch size:   144 | lm loss: 7.098891E+00 | loss scale: 512.0 | grad norm: 10280.295 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8360/  159576 | consumed samples:       426160 | elapsed time per iteration (ms): 22517.6 | learning rate: 6.000E-05 | global batch size:   160 | lm loss: 6.958164E+00 | loss scale: 1024.0 | grad norm: 13178.434 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8370/  159576 | consumed samples:       427760 | elapsed time per iteration (ms): 23182.1 | learning rate: 6.000E-05 | global batch size:   160 | lm loss: 6.889060E+00 | loss scale: 1024.0 | grad norm: 18842.234 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8380/  159576 | consumed samples:       429360 | elapsed time per iteration (ms): 23097.1 | learning rate: 6.000E-05 | global batch size:   160 | lm loss: 6.878168E+00 | loss scale: 1024.0 | grad norm: 18421.706 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-28 05:32:36] PULSE: tr8-104B is scheduled to start in 17:12:23 (at 2021-09-28T22:45:00) (1277218 on 'gpu_p13' partition)
[2021-09-28 05:32:36] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1277295_[1-10%1] on 'gpu_p13' partition)
[2021-09-28 05:32:36] PULSE: tr8-104B is running for 11:49:10 since 2021-09-27T17:43:26 (1271196 on 'gpu_p13' partition (r7i7n[6-8],r8i0n[0-8],r8i1n[0-4],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n0,r9i4n8,r9i5n[0-8],r9i6n[0-8],r9i7n[3-6])
 iteration     8390/  159576 | consumed samples:       430960 | elapsed time per iteration (ms): 22911.1 | learning rate: 6.000E-05 | global batch size:   160 | lm loss: 6.836983E+00 | loss scale: 1024.0 | grad norm: 21055.325 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8400/  159576 | consumed samples:       432560 | elapsed time per iteration (ms): 23311.7 | learning rate: 6.000E-05 | global batch size:   160 | lm loss: 6.867126E+00 | loss scale: 1024.0 | grad norm: 13309.684 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8410/  159576 | consumed samples:       434160 | elapsed time per iteration (ms): 22945.0 | learning rate: 6.000E-05 | global batch size:   160 | lm loss: 6.896465E+00 | loss scale: 1024.0 | grad norm: 24249.264 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8420/  159576 | consumed samples:       435760 | elapsed time per iteration (ms): 22797.0 | learning rate: 6.000E-05 | global batch size:   160 | lm loss: 6.923830E+00 | loss scale: 1024.0 | grad norm: 16621.010 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8430/  159576 | consumed samples:       437360 | elapsed time per iteration (ms): 23019.9 | learning rate: 6.000E-05 | global batch size:   160 | lm loss: 6.940806E+00 | loss scale: 1024.0 | grad norm: 15050.371 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8440/  159576 | consumed samples:       438960 | elapsed time per iteration (ms): 23026.2 | learning rate: 6.000E-05 | global batch size:   160 | lm loss: 6.984757E+00 | loss scale: 1024.0 | grad norm: 22968.730 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8450/  159576 | consumed samples:       440560 | elapsed time per iteration (ms): 22903.0 | learning rate: 6.000E-05 | global batch size:   160 | lm loss: 6.970832E+00 | loss scale: 1024.0 | grad norm: 25206.012 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8460/  159576 | consumed samples:       442160 | elapsed time per iteration (ms): 22992.7 | learning rate: 6.000E-05 | global batch size:   160 | lm loss: 6.992513E+00 | loss scale: 1024.0 | grad norm: 9219.678 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8470/  159576 | consumed samples:       443760 | elapsed time per iteration (ms): 23036.6 | learning rate: 6.000E-05 | global batch size:   160 | lm loss: 7.053975E+00 | loss scale: 1024.0 | grad norm: 9743.104 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8480/  159576 | consumed samples:       445360 | elapsed time per iteration (ms): 22710.5 | learning rate: 6.000E-05 | global batch size:   160 | lm loss: 7.087634E+00 | loss scale: 1024.0 | grad norm: 36403.836 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8490/  159576 | consumed samples:       446960 | elapsed time per iteration (ms): 22994.9 | learning rate: 6.000E-05 | global batch size:   160 | lm loss: 7.142048E+00 | loss scale: 1024.0 | grad norm: 8807.945 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8500/  159576 | consumed samples:       448560 | elapsed time per iteration (ms): 22707.3 | learning rate: 6.000E-05 | global batch size:   160 | lm loss: 7.160313E+00 | loss scale: 1024.0 | grad norm: 9148.356 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8510/  159576 | consumed samples:       450160 | elapsed time per iteration (ms): 22963.9 | learning rate: 6.000E-05 | global batch size:   160 | lm loss: 7.277474E+00 | loss scale: 1024.0 | grad norm: 43448.626 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8520/  159576 | consumed samples:       451760 | elapsed time per iteration (ms): 19193.8 | learning rate: 6.000E-05 | global batch size:   160 | loss scale: 64.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8530/  159576 | consumed samples:       453360 | elapsed time per iteration (ms): 15554.5 | learning rate: 6.000E-05 | global batch size:   160 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8540/  159576 | consumed samples:       454960 | elapsed time per iteration (ms): 15434.8 | learning rate: 6.000E-05 | global batch size:   160 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8550/  159576 | consumed samples:       456560 | elapsed time per iteration (ms): 15729.0 | learning rate: 6.000E-05 | global batch size:   160 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-28 06:32:50] PULSE: tr8-104B is scheduled to start in 17:29:26 (at 2021-09-29T00:02:17) (1277218 on 'gpu_p13' partition)
[2021-09-28 06:32:50] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1277295_[1-10%1] on 'gpu_p13' partition)
[2021-09-28 06:32:50] PULSE: tr8-104B is running for 12:49:24 since 2021-09-27T17:43:26 (1271196 on 'gpu_p13' partition (r7i7n[6-8],r8i0n[0-8],r8i1n[0-4],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n0,r9i4n8,r9i5n[0-8],r9i6n[0-8],r9i7n[3-6])
 iteration     8560/  159576 | consumed samples:       458160 | elapsed time per iteration (ms): 15526.6 | learning rate: 6.000E-05 | global batch size:   160 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8570/  159576 | consumed samples:       459760 | elapsed time per iteration (ms): 15343.9 | learning rate: 6.000E-05 | global batch size:   160 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8580/  159576 | consumed samples:       461360 | elapsed time per iteration (ms): 15516.0 | learning rate: 6.000E-05 | global batch size:   160 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8590/  159576 | consumed samples:       462960 | elapsed time per iteration (ms): 15788.5 | learning rate: 6.000E-05 | global batch size:   160 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8600/  159576 | consumed samples:       464560 | elapsed time per iteration (ms): 15421.5 | learning rate: 6.000E-05 | global batch size:   160 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8610/  159576 | consumed samples:       466160 | elapsed time per iteration (ms): 15365.4 | learning rate: 6.000E-05 | global batch size:   160 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8620/  159576 | consumed samples:       467760 | elapsed time per iteration (ms): 15460.6 | learning rate: 6.000E-05 | global batch size:   160 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8630/  159576 | consumed samples:       469360 | elapsed time per iteration (ms): 15794.2 | learning rate: 6.000E-05 | global batch size:   160 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8640/  159576 | consumed samples:       470960 | elapsed time per iteration (ms): 15928.5 | learning rate: 6.000E-05 | global batch size:   160 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8650/  159576 | consumed samples:       472560 | elapsed time per iteration (ms): 15514.8 | learning rate: 6.000E-05 | global batch size:   160 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8660/  159576 | consumed samples:       474320 | elapsed time per iteration (ms): 16639.1 | learning rate: 6.000E-05 | global batch size:   176 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8670/  159576 | consumed samples:       476080 | elapsed time per iteration (ms): 16569.6 | learning rate: 6.000E-05 | global batch size:   176 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8680/  159576 | consumed samples:       477840 | elapsed time per iteration (ms): 16695.6 | learning rate: 6.000E-05 | global batch size:   176 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8690/  159576 | consumed samples:       479600 | elapsed time per iteration (ms): 16700.3 | learning rate: 6.000E-05 | global batch size:   176 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8700/  159576 | consumed samples:       481360 | elapsed time per iteration (ms): 16569.3 | learning rate: 6.000E-05 | global batch size:   176 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8710/  159576 | consumed samples:       483120 | elapsed time per iteration (ms): 16526.6 | learning rate: 6.000E-05 | global batch size:   176 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8720/  159576 | consumed samples:       484880 | elapsed time per iteration (ms): 16370.8 | learning rate: 6.000E-05 | global batch size:   176 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8730/  159576 | consumed samples:       486640 | elapsed time per iteration (ms): 16678.1 | learning rate: 6.000E-05 | global batch size:   176 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8740/  159576 | consumed samples:       488400 | elapsed time per iteration (ms): 16715.4 | learning rate: 6.000E-05 | global batch size:   176 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8750/  159576 | consumed samples:       490160 | elapsed time per iteration (ms): 16605.2 | learning rate: 6.000E-05 | global batch size:   176 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8760/  159576 | consumed samples:       491920 | elapsed time per iteration (ms): 16522.8 | learning rate: 6.000E-05 | global batch size:   176 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8770/  159576 | consumed samples:       493680 | elapsed time per iteration (ms): 16607.3 | learning rate: 6.000E-05 | global batch size:   176 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-28 07:32:48] PULSE: tr8-104B is scheduled to start in 17:38:05 (at 2021-09-29T01:10:54) (1277218 on 'gpu_p13' partition)
[2021-09-28 07:32:48] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1277295_[1-10%1] on 'gpu_p13' partition)
[2021-09-28 07:32:48] PULSE: tr8-104B is running for 13:49:22 since 2021-09-27T17:43:26 (1271196 on 'gpu_p13' partition (r7i7n[6-8],r8i0n[0-8],r8i1n[0-4],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n0,r9i4n8,r9i5n[0-8],r9i6n[0-8],r9i7n[3-6])
 iteration     8780/  159576 | consumed samples:       495440 | elapsed time per iteration (ms): 16798.5 | learning rate: 6.000E-05 | global batch size:   176 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8790/  159576 | consumed samples:       497200 | elapsed time per iteration (ms): 16594.8 | learning rate: 6.000E-05 | global batch size:   176 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8800/  159576 | consumed samples:       498960 | elapsed time per iteration (ms): 16863.3 | learning rate: 6.000E-05 | global batch size:   176 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
srun: Job step aborted: Waiting up to 62 seconds for job step to finish.
Killing subprocess 30115
Killing subprocess 30116
Killing subprocess 72376
Killing subprocess 30117
Killing subprocess 72377
Killing subprocess 72378
Killing subprocess 30118
Main process received SIGTERM, exiting
Killing subprocess 72380
Killing subprocess 14784
Killing subprocess 14785
Killing subprocess 13422
Killing subprocess 14786
Killing subprocess 55737
Killing subprocess 14788
Killing subprocess 70412
Main process received SIGTERM, exiting
Killing subprocess 16940
Killing subprocess 72459
Killing subprocess 13423
Killing subprocess 74871
Killing subprocess 55738
Killing subprocess 29874
Killing subprocess 66501
Killing subprocess 16941
Killing subprocess 16942
Killing subprocess 16943
Killing subprocess 16970
Killing subprocess 70413
Killing subprocess 72867
Killing subprocess 13424
Killing subprocess 29875
Killing subprocess 13425
Main process received SIGTERM, exiting
Killing subprocess 74872
Killing subprocess 13332
Killing subprocess 38577
Killing subprocess 60665
Killing subprocess 59238
Killing subprocess 59239
Killing subprocess 55739
Killing subprocess 71579
Killing subprocess 55740
Killing subprocess 13333
Killing subprocess 70414
Killing subprocess 72868
Killing subprocess 70416
Killing subprocess 33635
Killing subprocess 74873
Killing subprocess 16971
Killing subprocess 59240
Killing subprocess 29876
Killing subprocess 72869
Killing subprocess 4131
Killing subprocess 31723
Killing subprocess 29877
Killing subprocess 70249
Main process received SIGTERM, exiting
Killing subprocess 71580
Killing subprocess 33197
Killing subprocess 33198
Killing subprocess 33199
Killing subprocess 16972
Killing subprocess 13334
Killing subprocess 37375
Killing subprocess 31519
Killing subprocess 60666
Killing subprocess 60928
Killing subprocess 5189
Killing subprocess 71748
Killing subprocess 60667
Killing subprocess 59241
Main process received SIGTERM, exiting
Killing subprocess 52958
Main process received SIGTERM, exiting
Main process received SIGTERM, exiting
Killing subprocess 71581
Main process received SIGTERM, exiting
Killing subprocess 76865
Killing subprocess 72870
Killing subprocess 4132
Killing subprocess 60668
Killing subprocess 31520
Main process received SIGTERM, exiting
Killing subprocess 38578
Killing subprocess 74874
Killing subprocess 16973
Killing subprocess 76175
Main process received SIGTERM, exiting
Killing subprocess 37376
Killing subprocess 60929
Main process received SIGTERM, exiting
Killing subprocess 72460
Killing subprocess 52959
Killing subprocess 66400
Killing subprocess 33636
Killing subprocess 5190
Killing subprocess 76176
Killing subprocess 73489
Killing subprocess 72461
Killing subprocess 13335
Killing subprocess 38579
Killing subprocess 76866
Main process received SIGTERM, exiting
Killing subprocess 6862
Killing subprocess 52960
Killing subprocess 38580
Killing subprocess 76177
Killing subprocess 31521
Killing subprocess 60930
Main process received SIGTERM, exiting
Killing subprocess 33637
slurmstepd: error: *** STEP 1271196.0 ON r7i7n6 CANCELLED AT 2021-09-28T07:42:47 ***
Killing subprocess 14888
Killing subprocess 71582
Killing subprocess 31522
Killing subprocess 72462
Killing subprocess 70250
Killing subprocess 33639
Killing subprocess 5191
Killing subprocess 76178
Killing subprocess 76867
Killing subprocess 73490
Killing subprocess 8322
Main process received SIGTERM, exiting
Main process received SIGTERM, exiting
Main process received SIGTERM, exiting
Killing subprocess 5192
Killing subprocess 71749
Killing subprocess 66401
Killing subprocess 70251
Killing subprocess 31724
Killing subprocess 23140
Main process received SIGTERM, exiting
Main process received SIGTERM, exiting
Killing subprocess 76869
Killing subprocess 24195
Killing subprocess 3669
Killing subprocess 14889
Killing subprocess 6863
Killing subprocess 73491
Killing subprocess 4133
Killing subprocess 70253
Killing subprocess 31725
Killing subprocess 14890
Killing subprocess 52961
Killing subprocess 66402
Killing subprocess 57345
Main process received SIGTERM, exiting
Main process received SIGTERM, exiting
Killing subprocess 66403
Killing subprocess 79017
Killing subprocess 5022
Killing subprocess 26301
Killing subprocess 71750
Main process received SIGTERM, exiting
Killing subprocess 23141
Killing subprocess 66502
Killing subprocess 2542
Killing subprocess 37377
Killing subprocess 32138
Killing subprocess 62368
Killing subprocess 4134
Killing subprocess 33200
Main process received SIGTERM, exiting
Main process received SIGTERM, exiting
Killing subprocess 79018
Killing subprocess 62803
Killing subprocess 62804
Killing subprocess 62805
Killing subprocess 42235
Killing subprocess 1224
Killing subprocess 31687
Killing subprocess 65257
Main process received SIGTERM, exiting
Killing subprocess 54282
Killing subprocess 2543
Killing subprocess 79019
Killing subprocess 42236
Killing subprocess 42237
Killing subprocess 36949
Killing subprocess 62369
Killing subprocess 23142
Killing subprocess 66503
Killing subprocess 3670
Main process received SIGTERM, exiting
Killing subprocess 2544
Killing subprocess 7298
Killing subprocess 37378
Killing subprocess 73492
Killing subprocess 42238
Killing subprocess 31688
Killing subprocess 31689
Killing subprocess 31690
Killing subprocess 66505
Killing subprocess 2546
Main process received SIGTERM, exiting
Main process received SIGTERM, exiting
Killing subprocess 26302
Killing subprocess 39557
Main process received SIGTERM, exiting
Main process received SIGTERM, exiting
Killing subprocess 78372
Killing subprocess 27460
Killing subprocess 62806
Killing subprocess 8323
Killing subprocess 24196
Killing subprocess 1225
Killing subprocess 23143
Killing subprocess 3671
Killing subprocess 54283
Killing subprocess 14892
Killing subprocess 7299
Killing subprocess 71751
Killing subprocess 5023
Killing subprocess 78860
Main process received SIGTERM, exiting
Killing subprocess 24197
Killing subprocess 57346
Main process received SIGTERM, exiting
Killing subprocess 7300
Main process received SIGTERM, exiting
Killing subprocess 78861
Killing subprocess 32139
Main process received SIGTERM, exiting
Killing subprocess 36950
Killing subprocess 1226
Killing subprocess 26303
Main process received SIGTERM, exiting
Killing subprocess 54284
Killing subprocess 5024
Killing subprocess 57347
Killing subprocess 26304
Killing subprocess 57348
Main process received SIGTERM, exiting
Killing subprocess 78373
Killing subprocess 27461
Killing subprocess 8324
Killing subprocess 24198
Killing subprocess 3672
Killing subprocess 78374
Killing subprocess 54286
Killing subprocess 78862
Killing subprocess 32140
Killing subprocess 8325
Main process received SIGTERM, exiting
Killing subprocess 36951
Killing subprocess 1227
Main process received SIGTERM, exiting
Main process received SIGTERM, exiting
Killing subprocess 78375
Killing subprocess 32141
Main process received SIGTERM, exiting
Killing subprocess 36952
Killing subprocess 7301
Killing subprocess 78863
Main process received SIGTERM, exiting
Main process received SIGTERM, exiting
Killing subprocess 31726
Main process received SIGTERM, exiting
Killing subprocess 7871
Killing subprocess 62370
Killing subprocess 60931
Main process received SIGTERM, exiting
Killing subprocess 79020
Main process received SIGTERM, exiting
Main process received SIGTERM, exiting
Killing subprocess 7872
Main process received SIGTERM, exiting
Main process received SIGTERM, exiting
Main process received SIGTERM, exiting
Killing subprocess 65258
Main process received SIGTERM, exiting
Killing subprocess 22589
Killing subprocess 62372
Main process received SIGTERM, exiting
Main process received SIGTERM, exiting
Killing subprocess 5025
Main process received SIGTERM, exiting
Killing subprocess 33581
Killing subprocess 7873
Main process received SIGTERM, exiting
Killing subprocess 66867
Main process received SIGTERM, exiting
Killing subprocess 7875
Killing subprocess 65259
Killing subprocess 65260
Main process received SIGTERM, exiting
Killing subprocess 22590
Killing subprocess 22591
Killing subprocess 66868
Killing subprocess 22592
Main process received SIGTERM, exiting
Killing subprocess 33582
Killing subprocess 66869
Killing subprocess 33583
Killing subprocess 6864
Killing subprocess 27462
Main process received SIGTERM, exiting
Killing subprocess 23047
Killing subprocess 6865
Killing subprocess 27463
Killing subprocess 66871
Main process received SIGTERM, exiting
Killing subprocess 43155
Main process received SIGTERM, exiting
Main process received SIGTERM, exiting
Killing subprocess 33585
Main process received SIGTERM, exiting
Killing subprocess 43156
Killing subprocess 43157
Killing subprocess 39558
Killing subprocess 23048
Killing subprocess 23049
Killing subprocess 23050
Killing subprocess 43159
Main process received SIGTERM, exiting
Killing subprocess 39559
Main process received SIGTERM, exiting
Killing subprocess 39560
Main process received SIGTERM, exiting
[2021-09-28 08:32:52] PULSE: ***ALERT: tr8-104B is not RUNNING or SCHEDULED! Alert someone at Eng WG***
[2021-09-28 09:33:05] PULSE: ***ALERT: tr8-104B is not RUNNING or SCHEDULED! Alert someone at Eng WG***
[2021-09-28 10:33:03] PULSE: ***ALERT: tr8-104B is not RUNNING or SCHEDULED! Alert someone at Eng WG***
[2021-09-28 11:33:17] PULSE: ***ALERT: tr8-104B is not RUNNING or SCHEDULED! Alert someone at Eng WG***
Traceback (most recent call last):
  File "/gpfswork/rech/six/commun/code/tr8-104B/bigscience/tools/slurm-status.py", line 177, in <module>
    main()
  File "/gpfswork/rech/six/commun/code/tr8-104B/bigscience/tools/slurm-status.py", line 172, in main
    send_email_alert_job_not_scheduled(args.job_name)
  File "/gpfswork/rech/six/commun/code/tr8-104B/bigscience/tools/slurm-status.py", line 61, in send_email_alert_job_not_scheduled
    send_email(subject, body)
  File "/gpfswork/rech/six/commun/code/tr8-104B/bigscience/tools/slurm-status.py", line 39, in send_email
    server = smtplib.SMTP("localhost")
  File "/gpfslocalsup/pub/anaconda-py3/2020.02/lib/python3.7/smtplib.py", line 251, in __init__
    (code, msg) = self.connect(host, port)
  File "/gpfslocalsup/pub/anaconda-py3/2020.02/lib/python3.7/smtplib.py", line 336, in connect
    self.sock = self._get_socket(host, port, self.timeout)
  File "/gpfslocalsup/pub/anaconda-py3/2020.02/lib/python3.7/smtplib.py", line 307, in _get_socket
    self.source_address)
  File "/gpfslocalsup/pub/anaconda-py3/2020.02/lib/python3.7/socket.py", line 728, in create_connection
    raise err
  File "/gpfslocalsup/pub/anaconda-py3/2020.02/lib/python3.7/socket.py", line 716, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused
[2021-09-28 12:33:29] PULSE: ***ALERT: tr8-104B is not RUNNING or SCHEDULED! Alert someone at Eng WG***
[2021-09-28 13:33:44] PULSE: ***ALERT: tr8-104B is not RUNNING or SCHEDULED! Alert someone at Eng WG***
[2021-09-28 14:34:11] PULSE: ***ALERT: tr8-104B is not RUNNING or SCHEDULED! Alert someone at Eng WG***
[2021-09-28 15:33:54] PULSE: ***ALERT: tr8-104B is not RUNNING or SCHEDULED! Alert someone at Eng WG***